langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

Add language exposure values to Spanish (Mexican) WS – Marchman Dallas Bilingual #280

Closed alvinwmtan closed 1 year ago

alvinwmtan commented 2 years ago

The Marchman Dallas Bilingual dataset only included language exposures for the English (American) WS dataset. For convenience these data should be ported over to the Spanish (Mexican) WS dataset as well.

These values were obtained by matching the child IDs, then fuzzy matching the ages (±1m) for administrations. As a result, some administrations don't have language exposure proportions (English and Spanish are still listed, just with blank proportions).

The updated files are as follows: SpanishMexicanWS_Marchman_Dallas_data.csv SpanishMexicanWS_Marchman_Dallas_fields.csv

HenryMehta commented 2 years ago

@alvinwmtan I've applied to the development database. Could you confirm if correct and I'll apply to production

alvinwmtan commented 2 years ago

@HenryMehta looks good—you may need to cache the comprehension/production values again though

HenryMehta commented 2 years ago

@alvinwmtan All deployed to production (and I ran the cache in dev as well)

alvinwmtan commented 2 years ago

@HenryMehta sorry to revisit this—they seem to be NULL in both prod and dev?

HenryMehta commented 1 year ago

@alvinwmtan I can see a few (11 I think) rows with null values in instruments_spanish_mexican_ws so I tried reloading SpanishMexicanWS_Marchman_Dallas but it didn't impact the records so it doesn't appear these are from that dataset (or the load doesn't work in certain cases, which might be when there are no responses at all). How did you conclude they are so that I can follow the logic through?

HenryMehta commented 1 year ago

Nevermind - found the null items

HenryMehta commented 1 year ago

@alvinwmtan I think I've found and sorted the issue and I've applied to both dev and prod. Could you check.

It seems to be only complexity questions that were the issue. And I think, based on the code it is probably an issue for any dataset with complexity questions. If the fix has worked, could you confirm/deny my suspicion please

alvinwmtan commented 1 year ago

@HenryMehta looks fixed now. not entirely sure whether the issue was with complexity items because those are relatively sparse across datasets, but hopefully future datasets with complexity won't have issues