langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

Estonian data #297

Closed mcfrank closed 7 months ago

mcfrank commented 1 year ago

From Ada Urm:

As it turns out, it took a bit longer to get the datasets ready and checked. So here I have added the Estonian CDI: Words and Sentences dataset (in .csv format). I have included in a separate file the translated scale items according to the current dataset. I also included the ECDI2 form.

Since the data is licensed under our university it may not be used for commercial purposes.

If our datasets are used the following sources need to be credited:

Urm, A., & Tulviste, T. (2016). Sources of individual variation in Estonian toddlers’ expressive vocabulary. First Language, 36(6), 580-600.

Tulviste, T. (2007). Variation in vocabulary development among Estonian children as a function of child’s gender, birth order, child-care, and parental education. In M Eriksson (Ed.), Proceedings from the First European Network Meeting on the Communicative Development Inventories (pp. 16-21). Gävle, Sweden: University of Gävle.

I am ready to answer any questions regarding the dataset and help out in any way I can.

ECDI2_translated scale items for wordbank.docx ECDI2_Words and Sentences.pdf ECDI2_data for wordbank.csv

mcfrank commented 1 year ago

updated data from Ada, now in SPSS format. she has checked this one - don't use the one above.

ECDI2_dataset_wordbank_corrected.sav.zip

alvinwmtan commented 1 year ago

Files ready for import:

[Estonian_WS].csv EstonianWS_Urm_data.csv EstonianWS_Urm_fields.csv EstonianWS_Urm_values.csv

HenryMehta commented 10 months ago

@alvinwmtan when I try to download the Estonian_WS.CSV file I am getting invalid characters, for example:

item_5,sounds,mŠŠ,mŠŠ,produces,word,baa,baa baa

According to the data file this item should be: mää (Is this correct?)

Can you confirm that is right and trying loading a file that you can download with the correct characters. If you can't, can you tell me what character set you're using and I'll try to convert to utf-8 I think it probably the way you've saved the file.

Also, what citation should I use?

alvinwmtan commented 10 months ago

@HenryMehta sorry about that, here's the UTF-8 version: [Estonian_WS].csv

alvinwmtan commented 10 months ago

Citations are:

Urm, A., & Tulviste, T. (2016). Sources of individual variation in Estonian toddlers’ expressive vocabulary. First Language, 36(6), 580-600.

Tulviste, T. (2007). Variation in vocabulary development among Estonian children as a function of child’s gender, birth order, child-care, and parental education. In M Eriksson (Ed.), Proceedings from the First European Network Meeting on the Communicative Development Inventories (pp. 16-21). Gävle, Sweden: University of Gävle.

HenryMehta commented 10 months ago

@alvinwmtan word tulevik, row 1070 has the value 3 which isn't specified in the values file. What do you want 3 coded as?

alvinwmtan commented 9 months ago

@HenryMehta let's code 3 as NA (error).

HenryMehta commented 9 months ago

@alvinwmtan ready to test in dev

alvinwmtan commented 9 months ago

@HenryMehta looks good from R, thanks!