langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

Import French-English bilingual data from Byers-Heinlein et al. #292

Closed kristabh closed 5 months ago

kristabh commented 1 year ago

We have a preprint with open data now available with CDIs from 743 bilingual infants (French-English). Please feel free to import this into Wordbank if you have the opportunity.

Byers-Heinlein, K., Gonzalez-Barrero, A. M., Schott, E., & Killam, H. (2023). Sometimes larger, sometimes smaller: Measuring vocabulary in monolingual and bilingual infants and toddlers. https://doi.org/10.31234/osf.io/x7s4u

https://github.com/kbhlab/bilingual-cdi-public

@hilary-rose wrangled the dataset and can help if there are any questions

alvinwmtan commented 1 year ago

Hi @kristabh / @hilary-rose, I just had a look at the Github repo and it seems like there's only summary data—would you be able to share the original raw item-level data? Thanks!

hilary-rose commented 1 year ago

Hi @alvinwmtan, happy to send the item-level data by email! What's the best address to send to?

alvinwmtan commented 1 year ago

Hi @hilary-rose, please send it to tanawm {at} stanford {dot} edu 😃

alvinwmtan commented 1 year ago

Hi @hilary-rose, just wanted to do a quick check on some of the French WS complexity items. There are the following items in the data:

Are the labels for these items swapped by any chance?

alvinwmtan commented 1 year ago

Note: items mentioned above were swapped (via email thread).

Files are ready for import: [French_Quebecois_WG].csv EnglishWG_ByersHeinlein_data.csv EnglishWG_ByersHeinlein_fields.csv EnglishWG_ByersHeinlein_values.csv EnglishWS_ByersHeinlein_data.csv EnglishWS_ByersHeinlein_fields.csv EnglishWS_ByersHeinlein_values.csv FrenchQuebecoisWG_ByersHeinlein_data.csv FrenchQuebecoisWG_ByersHeinlein_fields.csv FrenchQuebecoisWG_ByersHeinlein_values.csv FrenchQuebecoisWS_ByersHeinlein_data.csv FrenchQuebecoisWS_ByersHeinlein_fields.csv FrenchQuebecoisWS_ByersHeinlein_values.csv

HenryMehta commented 7 months ago

@alvinwmtan I've tried loading these and I'm starting with the English WG, but the current [English_WG].csv has no item item_445.

I need a corrected file please

HenryMehta commented 7 months ago

@alvinwmtan and the FrenchWS dataset (FrenchQuebecoisWS_ByersHeinlein_data.csv) has fields that don't appears in the [French_Quebecois_WS].csv, for example 'votre enfant parle-t-il d\'ΩvΩnements passΩs ou de personnes qui ne sont pas prΩsentes? par exemple si vous avez vu une parade ensemble la semaine derniΨre lui arrive-t-il de vous en "parler" en mentionnant des mots comme "parade" "clown" etc.?'

hilary-rose commented 7 months ago

Hi everyone, I saw Henry's messages above and checked the data we have on hand. It seems like our English WG dataset has item 445 in it. If you need me to re-share the dataset, please let me know!

alvinwmtan commented 7 months ago

@HenryMehta Here's the fixed English WG. [English_WG].csv

I don't quite understand what the problem is for the French WS—the item you mentioned appears as item_665 for me in both the fields file and in the form definition.

HenryMehta commented 7 months ago

@alvinwmtan I now get and error English_American_WG has no field named 'item_466'

alvinwmtan commented 7 months ago

@HenryMehta Sorry about that, there were two items that I missed (also item_478) [English_WG].csv

HenryMehta commented 7 months ago

@alvinwmtan deploying to dev now

HenryMehta commented 7 months ago

@alvinwmtan and the FrenchWS dataset (FrenchQuebecoisWS_ByersHeinlein_data.csv) has fields that don't appears in the [French_Quebecois_WS].csv, for example 'votre enfant parle-t-il d'ΩvΩnements passΩs ou de personnes qui ne sont pas prΩsentes? par exemple si vous avez vu une parade ensemble la semaine derniΨre lui arrive-t-il de vous en "parler" en mentionnant des mots comme "parade" "clown" etc.?'

@alvinwmtan I still have this issue

alvinwmtan commented 7 months ago

@HenryMehta I still don't quite understand what the problem is. I do see that item in the data file, and also see it in the fields file as item_665, which is in [French_Quebecois_WS].csv. Is the problem that there are unreadable characters due to encoding issues?

HenryMehta commented 7 months ago

The problem is the text doesn't match. The formats must be different or something but something is well off between the definition, fields and data files

HenryMehta commented 7 months ago

@alvinwmtan I've worked through the files. There were alot or errors where the files hadn't been saved utf-8 so the strings were not matching. Sorted and deploying to dev now

alvinwmtan commented 7 months ago

@HenryMehta looks good