langcog / wordbank

open repository of children's vocabulary data
http://wordbank.stanford.edu
GNU General Public License v2.0
64 stars 10 forks source link

Update unilemmas #177

Closed daniellekellier closed 2 years ago

daniellekellier commented 6 years ago

Add unilemmas for the following forms

kachergis commented 3 years ago

I'd be happy to find help for these, but are these up-to-date?

kachergis commented 3 years ago

get_crossling_items() returns 1380 unilemmas, and I guess we need to check which of those are NA for each language. Just looking at a couple items, the above list is out-of-date (and incorrect? or at least Czech, Cantonese, and others don't appear on this list -- but maybe they don't have 'hat' or 'nose'?): unique languages in get_crossling_data(uni_lemmas = c("hat", "nose")) [1] "British Sign Language" "Croatian" "Danish" "English (American)"
[5] "French (French)" "French (Quebecois)" "Hebrew" "Italian"
[9] "Kiswahili" "Korean" "Norwegian" "Russian"
[13] "Slovak" "Spanish (Mexican)" "Swedish" "Turkish"

mcfrank commented 3 years ago

@kachergis

kachergis commented 3 years ago

added Latvian WG, Mandarin Taiwanese WG, and Spanish European WG uni_lemmas, updated (added more and fixed a few) Korean WG and Mandarin IC Beijing uni_lemmas

HenryMehta commented 3 years ago

@kachergis has this been merged into master and do I need to merge it into the Py38 version? If so what was the merge so I know what I'm looking to pull back

kachergis commented 3 years ago

@HenryMehta I made the PR a few weeks back but only just merged it now: https://github.com/langcog/wordbank/commit/829b99be0164a04965111ec5865d3802b10538f7 (I assume it didn't make the Py38 version, then) Mika said you likely know how to update these tables, but let her/I know if you don't -- thank you!!!

HenryMehta commented 3 years ago

@kachergis I would appreciate some confirmation on how to merge them because I've done it by rebuilding the whole database and I don't want to do that again. I also want to ensure things don't get double counted which has happened, so can you give specific instructions about exactly what to do - thanks

HenryMehta commented 3 years ago

@kachergis Hi George, I really need some direction here because I don't know what you've changed and how I migrate it - I don't see any differences in the numbers and I could just merge the branches but if I don't know what you've done or what I'm then meant to do I won't necessarily apply it properly Thanks

kachergis commented 3 years ago

Hi @HenryMehta -- sorry for the slow reply: I've been traveling. I've updated a few languages' uni_lemma columns for the WG forms, but more are in progress. I've briefly discussed with Mika how to remove and then reingest the updated instruments, so I don't think a full database rebuild will be needed -- but let me get back to you once we're done updating.

On Sun, Apr 11, 2021 at 12:25 PM Henry Mehta @.***> wrote:

@kachergis https://github.com/kachergis Hi George, I really need some direction here because I don't know what you've changed and how I migrate it - I don't see any differences in the numbers and I could just merge the branches but if I don't know what you've done or what I'm then meant to do I won't necessarily apply it properly Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/langcog/wordbank/issues/177#issuecomment-817359288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVUWP53RB5T34AQ3V6WQIDTIHZTTANCNFSM4EIPNJIQ .

mcfrank commented 3 years ago

Instructions for emailing new folks:

Hi there, thanks very much for agreeing to help us with our translation task! 

Instructions

We are trying to get good translations of different lists of words in many different languages into the same set of "universal" concepts - here expressed in English. A list of these concepts are on the "BANK" tab. So, for example, we want to map "perro" in Spanish onto "dog" in English. Of course, exact translation is really hard, and we're just trying to do our best (with your help). 

To do the task, navigate to your language, and then, for each word on the list, take a look at the translations that are given. We're looking for the best English translation. These are words that are among the first words that children learn, so your translation should be closest to the meaning of the word as it would be used by a young child (say, under 3 years old).
If the translation is good, just put a 1 in the "is translation good?" column.
If the translation is bad, put a 0, and put an alternative in the "alternative translation column."
For cases when there are two equally good English words, put both.
If you don't think there is a good translation into a reasonable English word that a kid might know, you can leave the alternative translation blank, and write a note. 
Also feel free to write notes about other words to let us know your thoughts behind your decision or if there is anything else we should consider.
Here's the spreadsheet we're working from. Your language should be on one of the tabs on the bottom. 

https://docs.google.com/spreadsheets/d/15RMcFOURhtv0DxA8aBXaXExq-7qChB6EEnlh8p4V7o4/edit#gid=1108769780

Let me know if you have any other questions. And when you're done, just respond to this email and we'll get you an amazon gift card in thanks for your time. 
kachergis commented 3 years ago

check current coverage

instr <- wordbankr::get_instruments() %>% 
  arrange(desc(unilemma_coverage))