peekbank / peekbank-data-import

A collection of R scripts that turn raw datasets into the peekds format ready for ingestion by the peekbank
2 stars 3 forks source link

IMPORT kremin_2021 #133

Open alvinwmtan opened 2 months ago

alvinwmtan commented 2 months ago

Kremin et al. (2021) has two data subsets: Eng–Fra bilingual and Eng–Spa bilingual. The former is included in montat_2022, but the latter is not; we will import this whole dataset separately.

alvinwmtan commented 2 months ago

IDless import complete; ready for review

alvinwmtan commented 2 months ago

Checklist for code review v2024

To start:

Common issues to check:

Trials

Trial Types

Stimuli

Subjects

General

vboyce commented 1 month ago

@alvinwmtan Is the raw data for kremlin on the peekbank osf? (because I'm not able to download it and not seeing it there?)

alvinwmtan commented 1 month ago

sorry, just uploaded. should be there now

vboyce commented 1 month ago

Thanks! Might I also have "demo_comp.Rda"?

alvinwmtan commented 1 month ago

ah yes sorry, forgot i had to copy it over from sander-montant_2022

vboyce commented 1 month ago

sorry to keep asking for files (@alvinwmtan), but I can't find target-distractor-pairs.csv trial_info_fr.csv trial_info_sp.csv and also the images if we have them

alvinwmtan commented 1 month ago

my bad; added them. there are wmv files that could be screenshotted to grab the images but i haven't done so—feel free to!

note also that trial_info_fr.csv and trial_info_sp.csv were constructed by me rather than from the original raw_data; it is possible but just somewhat annoying to programmatically pull together all the different pieces of info needed.

vboyce commented 1 month ago

Images could be pulled via screenshot readme mentions that CDI data for the montreal subset exists (but has not been imported yet) (and DVAP -- another vocab measure)

vboyce commented 1 month ago

@alvinwmtan I'm getting a validation error because of an aoi region that is all NA's -- it looks like this is coming from the fact that one of the datasets has aoi coordinates and the other doesn't (hand coded) and NAs get added to that one in a bind_rows? Does that sound right? / Do you have ideas for fixing?

alvinwmtan commented 1 month ago

this is correct (montreal has AOIs and princeton doesn't). somehow i didn't get a validation error when i ran it though? not sure why this is popping up

vboyce commented 1 month ago

so the validation issue seems to not be about the NAs and more be about that there are multiple regions in the aoi_region_set, but there's only 1 in the trial_types? which I think I've traced back to

 if(!is.na(data$l_x_max[[1]])){

    trial_types$aoi_region_set_id <- 0

in the digest_data function at https://github.com/peekbank/peekbank-data-import/blob/33e750103cb1249b35daa01c6a7e9de1da5a6749/helper_functions/idless_draft.R#L284C1-L284C39.

Commenting out that line does seem to "fix" things, but I don't know what it's purpose is -- @adriansteffan what is this line supposed to be doing?

adriansteffan commented 1 month ago

so the validation issue seems to not be about the NAs and more be about that there are multiple regions in the aoi_region_set, but there's only 1 in the trial_types? which I think I've traced back to

 if(!is.na(data$l_x_max[[1]])){

    trial_types$aoi_region_set_id <- 0

in the digest_data function at https://github.com/peekbank/peekbank-data-import/blob/33e750103cb1249b35daa01c6a7e9de1da5a6749/helper_functions/idless_draft.R#L284C1-L284C39.

Commenting out that line does seem to "fix" things, but I don't know what it's purpose is -- @adriansteffan what is this line supposed to be doing?

The line is supposed to remind me to read my code more carefully before committing. I removed it, thanks for the catch!