[luchkina_waxman_2024] write import script from scratch

adriansteffan commented 5 months ago

Notes: has two labels, use the first one for label and target onset

adriansteffan commented 5 months ago

The dataset passes the validator, but there are a few open points

The mails state that they used 8 additional words in their cdi. How do we encode this, since the rawscore will be skewed? Also, I am still confused what "measure" and "instrument" to put here, since I am not 100% sure what Level 1 MCDI translates to. Maybe it would make sense to dedicate a slot in a meeting to drawing some sort of decision tree/diagram for me so that I don't ask this for the next 100 datasets.
monitor_size_x, monitor_size_y are missing (not sure how important these are with icatcher)
The lookingscore graph (sample average preference for target from -1 to 1 over time) looks like it's shifted, and there is never a preference for the distractor even before the onset of the target word. I haven't done anything to the time data other than normalize it and didn't see anything weird in the data itself, so another pair of eyes would be great here.

adriansteffan commented 5 months ago

seems to be wg short, I will poke at the raw data to get clean cdi scores for both prod and comp

mzettersten commented 5 months ago

Will reach out to Elena. Current working hypothesis:

target items are more salient than distractor items (since these are two distinct sets)
looking curve is less distinctive given age of infants
there might also be some lagginess in the timing

adriansteffan commented 5 months ago

I fixed the CDI, but there was one suspicious thing:

The CDI data for Subject_55 is missing in their wide format table (all other cdi data is there). This is likely a result of Subject_55 missing in the underlying raw cdi data they provided us with. While Subject_55 is missing, Subject_5 appears twice with the exact same cdi pattern. Likely a typo that led to data loss, so the import assumes there is no CDI data for subject_55, but it might be worth asking the authors again

peekbank / peekbank-data-import

[luchkina_waxman_2024] write import script from scratch #73