langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

catalan data #240

Closed mcfrank closed 7 months ago

mcfrank commented 2 years ago

From Isabel Serrat

Here I attach the spps file of the Catalan MCB-CDI-I with the data of the variables sex, age in months, birth order, number of siblings, mother's educational level and bilingualism.

Note about one participant in the template: 
We detected two participants with the same number (id 581). We do not know what could have happened, but we only have one questionnaire that corresponds to that number. If we do not know what happened, we will have to delete the information about that participant. I will let you know which one it is.

Plantilla variables CDI-I Wordbank maig 22.sav.zip

@vmarchman maybe you can convert from SAV and reupload?

vmarchman commented 2 years ago

Here are the definitions:

Sex: (sexe)

1 = nena = girl 2 = nen = boy

Birth order (order_naixement):

1 = first, 2 = second, 3 = third 4 = later

Bilingualism (MonoBiling): (not sure what exactly these definitions mean)

1 = monolingual 2 = "family bilingualism" (bilinguisme familiar) 3 = "other bilingualisms" (Altres bilinguismes)

Maternal Education (escola_mare):

1 = without schooling 2 = primary 3 = secondary 4 = university

On Tue, Mar 22, 2022 at 11:38 AM Michael Frank @.***> wrote:

From Isabel Serrat

Here I attach the spps file of the Catalan MCB-CDI-I with the data of the variables sex, age in months, birth order, number of siblings, mother's educational level and bilingualism.

Note about one participant in the template: We detected two participants with the same number (id 581). We do not know what could have happened, but we only have one questionnaire that corresponds to that number. If we do not know what happened, we will have to delete the information about that participant. I will let you know which one it is.

Plantilla variables CDI-I Wordbank maig 22.sav.zip https://github.com/langcog/wordbank/files/8327046/Plantilla.variables.CDI-I.Wordbank.maig.22.sav.zip

@vmarchman https://github.com/vmarchman maybe you can convert from SAV and reupload?

— Reply to this email directly, view it on GitHub https://github.com/langcog/wordbank/issues/240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2TUTH2OG37DKIU7VJYSI3VBIHRHANCNFSM5RLXAXHQ . You are receiving this because you were mentioned.Message ID: @.***>

HenryMehta commented 2 years ago

This is the data as a csv file

output.csv

HenryMehta commented 2 years ago

Actually, I don't think this can be the data because it is only the child info rather than the administration responses

mcfrank commented 2 years ago

oops - we need to get the scans rekeyed! https://drive.google.com/drive/u/1/folders/1l3ZBAAu1UxeWHvg5kuMuV_903VCLXpy-

mcfrank commented 1 year ago

And here's the WS data as well: Metadata in SPSS Here are the scans: https://drive.google.com/drive/folders/184w5qyLRcQh7XNtilzNPTh_IQ2niTDpX?usp=share_link Plantilla variables CDI-II Wordbank 3_23.sav.zip

mcfrank commented 1 year ago

@rbzsparks is going to work on flatworld rekeying

alvinwmtan commented 1 year ago

WG: [Catalan_WG].csv CatalanWG_Serrat_data.csv CatalanWG_Serrat_fields.csv CatalanWG_Serrat_values.csv The demographics for child 581 were taken from the first entry in the metadata (there are two entries as noted above)

WS pending rekeying

alvinwmtan commented 1 year ago

@mcfrank Not sure how much we want to dive into this, but in the WS, the file labelled 545-558_bo.pdf appears to have child IDs 545, 557, and then 507–517 (which are duplicated from the file 500-517.pdf). Maybe send an email to the contributor to ask if they have the original scan? (If not I'll discard the duplicates)

alvinwmtan commented 1 year ago

[Catalan_WS].csv CatalanWS_Serrat_data.csv CatalanWS_Serrat_fields.csv CatalanWS_Serrat_values.csv Catalan_notes.md

Missing scans ignored; remaining WS data as above

HenryMehta commented 10 months ago

@alvinwmtan What are contributor and citation for this dataset?

alvinwmtan commented 10 months ago

Contributor: Elisabet Serrat Sellabona, Universitat de Girona @mcfrank do you know what the citation is for this?

HenryMehta commented 10 months ago

@alvinwmtan Line 48 has an age of NA. We cannot import this record without a valid age. What would you like me to do?

mcfrank commented 10 months ago

Guessing it's: https://www.torrossa.com/en/resources/an/5155491

Mike

On Mon, Dec 4, 2023 at 11:30 AM Alvin Tan @.***> wrote:

Contributor: Elisabet Serrat Sellabona, Universitat de Girona @mcfrank https://github.com/mcfrank do you know what the citation is for this?

— Reply to this email directly, view it on GitHub https://github.com/langcog/wordbank/issues/240#issuecomment-1839335016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI25F6Y5UKLIZ3A4KG32BDYHYQDPAVCNFSM5RLXAXH2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTHEZTGNJQGE3A . You are receiving this because you were mentioned.Message ID: @.***>

alvinwmtan commented 10 months ago

@alvinwmtan Line 48 has an age of NA. We cannot import this record without a valid age. What would you like me to do?

@HenryMehta Sorry for not catching this, let's exclude all the participants with age NA (there are a few of them, not just this one)

HenryMehta commented 10 months ago

@alvinwmtan deployed to dev for testing

alvinwmtan commented 9 months ago

WG: [Catalan_WG].csv CatalanWG_Serrat_data.csv CatalanWG_Serrat_fields.csv CatalanWG_Serrat_values.csv The demographics for child 581 were taken from the first entry in the metadata (there are two entries as noted above)

WS pending rekeying

@HenryMehta could you also deploy the WG? Thanks!

HenryMehta commented 9 months ago

@alvinwmtan sorry, missed it. Deployed to dev now

alvinwmtan commented 9 months ago

@HenryMehta WG looks good. For WS, I realised that we should be using the "edat" column, not "age"—this should have much fewer NAs.

HenryMehta commented 9 months ago

@alvinwmtan It is already using edat

alvinwmtan commented 9 months ago

@HenryMehta hmm, in that case there should be 859 administrations, not 605 (which it is currently)

HenryMehta commented 9 months ago

@alvinwmtan There are 605 rows in the data file

alvinwmtan commented 9 months ago

@HenryMehta this file has 866; perhaps you manually filtered out those with age NA and not edat NA?

CatalanWS_Serrat_data.csv

HenryMehta commented 9 months ago

@alvinwmtan Those without an age cannot be loaded so I deleted from dataset

alvinwmtan commented 9 months ago

@HenryMehta yes—those without edat should be deleted (not those without the column labelled age, which is not in fact the data_age column for this dataset)

HenryMehta commented 9 months ago

@alvinwmtan deployed

alvinwmtan commented 9 months ago

@HenryMehta looks good now, thanks!