langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

Import French-English bilingual data from Byers-Heinlein et al. #292

Closed kristabh closed 9 months ago

kristabh commented 1 year ago

We have a preprint with open data now available with CDIs from 743 bilingual infants (French-English). Please feel free to import this into Wordbank if you have the opportunity.

Byers-Heinlein, K., Gonzalez-Barrero, A. M., Schott, E., & Killam, H. (2023). Sometimes larger, sometimes smaller: Measuring vocabulary in monolingual and bilingual infants and toddlers. https://doi.org/10.31234/osf.io/x7s4u

https://github.com/kbhlab/bilingual-cdi-public

@hilary-rose wrangled the dataset and can help if there are any questions

alvinwmtan commented 1 year ago

Hi @kristabh / @hilary-rose, I just had a look at the Github repo and it seems like there's only summary data—would you be able to share the original raw item-level data? Thanks!

hilary-rose commented 1 year ago

Hi @alvinwmtan, happy to send the item-level data by email! What's the best address to send to?

alvinwmtan commented 1 year ago

Hi @hilary-rose, please send it to tanawm {at} stanford {dot} edu 😃

alvinwmtan commented 1 year ago

Hi @hilary-rose, just wanted to do a quick check on some of the French WS complexity items. There are the following items in the data:

Are the labels for these items swapped by any chance?

alvinwmtan commented 1 year ago

Note: items mentioned above were swapped (via email thread).

Files are ready for import: [French_Quebecois_WG].csv EnglishWG_ByersHeinlein_data.csv EnglishWG_ByersHeinlein_fields.csv EnglishWG_ByersHeinlein_values.csv EnglishWS_ByersHeinlein_data.csv EnglishWS_ByersHeinlein_fields.csv EnglishWS_ByersHeinlein_values.csv FrenchQuebecoisWG_ByersHeinlein_data.csv FrenchQuebecoisWG_ByersHeinlein_fields.csv FrenchQuebecoisWG_ByersHeinlein_values.csv FrenchQuebecoisWS_ByersHeinlein_data.csv FrenchQuebecoisWS_ByersHeinlein_fields.csv FrenchQuebecoisWS_ByersHeinlein_values.csv

HenryMehta commented 11 months ago

@alvinwmtan I've tried loading these and I'm starting with the English WG, but the current [English_WG].csv has no item item_445.

I need a corrected file please

HenryMehta commented 11 months ago

@alvinwmtan and the FrenchWS dataset (FrenchQuebecoisWS_ByersHeinlein_data.csv) has fields that don't appears in the [French_Quebecois_WS].csv, for example 'votre enfant parle-t-il d\'ΩvΩnements passΩs ou de personnes qui ne sont pas prΩsentes? par exemple si vous avez vu une parade ensemble la semaine derniΨre lui arrive-t-il de vous en "parler" en mentionnant des mots comme "parade" "clown" etc.?'

hilary-rose commented 11 months ago

Hi everyone, I saw Henry's messages above and checked the data we have on hand. It seems like our English WG dataset has item 445 in it. If you need me to re-share the dataset, please let me know!

alvinwmtan commented 11 months ago

@HenryMehta Here's the fixed English WG. [English_WG].csv

I don't quite understand what the problem is for the French WS—the item you mentioned appears as item_665 for me in both the fields file and in the form definition.

HenryMehta commented 11 months ago

@alvinwmtan I now get and error English_American_WG has no field named 'item_466'

alvinwmtan commented 11 months ago

@HenryMehta Sorry about that, there were two items that I missed (also item_478) [English_WG].csv

HenryMehta commented 11 months ago

@alvinwmtan deploying to dev now

HenryMehta commented 11 months ago

@alvinwmtan and the FrenchWS dataset (FrenchQuebecoisWS_ByersHeinlein_data.csv) has fields that don't appears in the [French_Quebecois_WS].csv, for example 'votre enfant parle-t-il d'ΩvΩnements passΩs ou de personnes qui ne sont pas prΩsentes? par exemple si vous avez vu une parade ensemble la semaine derniΨre lui arrive-t-il de vous en "parler" en mentionnant des mots comme "parade" "clown" etc.?'

@alvinwmtan I still have this issue

alvinwmtan commented 11 months ago

@HenryMehta I still don't quite understand what the problem is. I do see that item in the data file, and also see it in the fields file as item_665, which is in [French_Quebecois_WS].csv. Is the problem that there are unreadable characters due to encoding issues?

HenryMehta commented 11 months ago

The problem is the text doesn't match. The formats must be different or something but something is well off between the definition, fields and data files

HenryMehta commented 11 months ago

@alvinwmtan I've worked through the files. There were alot or errors where the files hadn't been saved utf-8 so the strings were not matching. Sorted and deploying to dev now

alvinwmtan commented 11 months ago

@HenryMehta looks good