Open adriansteffan opened 5 months ago
The dataset passes the validator, but there are a few open points
The mails state that they used 8 additional words in their cdi. How do we encode this, since the rawscore will be skewed? Also, I am still confused what "measure" and "instrument" to put here, since I am not 100% sure what Level 1 MCDI translates to. Maybe it would make sense to dedicate a slot in a meeting to drawing some sort of decision tree/diagram for me so that I don't ask this for the next 100 datasets.
monitor_size_x, monitor_size_y are missing (not sure how important these are with icatcher)
The lookingscore graph (sample average preference for target from -1 to 1 over time) looks like it's shifted, and there is never a preference for the distractor even before the onset of the target word. I haven't done anything to the time data other than normalize it and didn't see anything weird in the data itself, so another pair of eyes would be great here.
seems to be wg short, I will poke at the raw data to get clean cdi scores for both prod and comp
Will reach out to Elena. Current working hypothesis:
I fixed the CDI, but there was one suspicious thing:
The CDI data for Subject_55 is missing in their wide format table (all other cdi data is there). This is likely a result of Subject_55 missing in the underlying raw cdi data they provided us with. While Subject_55 is missing, Subject_5 appears twice with the exact same cdi pattern. Likely a typo that led to data loss, so the import assumes there is no CDI data for subject_55, but it might be worth asking the authors again
Notes: has two labels, use the first one for label and target onset