Closed LinguList closed 2 years ago
We can probably only keep those languages with at least 50 percent coverage.
This leaves the following:
------------------- --- --- ----
Ava Guaraní 262 214 0.66
Ayoreo 375 227 0.70
Chamacoco 250 162 0.50
Enlhet 437 251 0.77
Enxet Sur 333 208 0.64
Guaraní Paraguayo 323 237 0.73
Iyojwa'ja Chorote 358 272 0.84
Iyoʼwujwa Chorote 254 190 0.59
Lule 294 173 0.53
Maká 281 242 0.75
Mapudungun 254 206 0.64
Mbya 222 167 0.52
Mocoví 297 215 0.66
Nivaclé 373 248 0.77
Pilagá 286 248 0.77
Quichua Santiagueño 235 176 0.54
Tapiete 271 202 0.62
Toba 470 273 0.84
Toba-pilagá 367 254 0.78
Wichí 387 241 0.74
------------------- --- --- ----
We would then ignore:
--------------- --- --- ----
Abipón 215 154 0.48
Guaraní Izoceño 24 24 0.07
Iyówuj'wa 45 37 0.11
Kadiweo 224 157 0.48
Toba de Cerrito 179 153 0.47
Vilela 73 60 0.19
--------------- --- --- ----
@Bridnicolas, but we can also say we take at least 150 concepts. In that case, we ignore only Guarani Izoceno, Iyowuj'wa and Vilela.
@Bridnicolas, can I ask you to check now with this automated list?
Number Variety Forms Concepts Base BIO Coverage
-------- ------------------- ------- ---------- ------ ----- ----------
1 Abipón 215 154 154 0 0.48
2 Ava Guaraní 262 214 214 0 0.66
3 Ayoreo 375 227 211 16 0.70
4 Chamacoco 250 162 161 1 0.50
5 Enlhet 437 251 216 36 0.77
6 Enxet Sur 333 208 188 20 0.64
7 Guaraní Paraguayo 323 237 213 24 0.73
8 Iyojwa'ja Chorote 358 272 214 58 0.84
9 Iyoʼwujwa Chorote 254 190 176 14 0.59
10 Kadiweo 224 157 156 1 0.48
11 Lule 294 173 173 0 0.53
12 Maká 281 242 199 44 0.75
13 Mapudungun 254 206 206 0 0.64
14 Mbya 222 167 167 0 0.52
15 Mocoví 297 215 212 3 0.66
16 Nivaclé 373 248 215 33 0.77
17 Pilagá 286 248 211 38 0.77
18 Quichua Santiagueño 235 176 162 14 0.54
19 Tapiete 271 202 193 9 0.62
20 Toba 470 273 216 58 0.84
21 Toba de Cerrito 179 153 153 0 0.47
22 Toba-pilagá 367 254 192 63 0.78
23 Wichí 387 241 208 33 0.74
My suggestion is to transfer this list, when isssues #8 and #7 have been solved, to our paper where we have the table. The good thing is: we can automate all numbers. The bad thing is: it does not look very exhaustive for the basic concepts. But well, we can probably live with this.
Yes. BIO means "ethnobiological terms"? In that case, it may be incorrect in some parts. We cannot have 154 Abipón ethnobiological terms, because we actually don't have any (zero!)
Besides, I'm not sure ignoring Iyowujwa and Guarani Izoceño would be accurate. We only have ethnobiological terms for those, not base terms (because they are generally considered dialects of Iyojwaja and Ava Guaraní, respectively), but the thing is that the ethnobio terms are many, and make a good part of our analysis.
See my updated table. I swapped them :(
Yes, I just have. Sorry
So it is all nice as is: we have some ethnobiological items and basic vocab. Fine. They can later be expanded.
I am sorry for the error.
Below 0.5 there's Abipón, Kadiwéu and Toba de Cerrito. Are we still planning to ignore them? So, we transfer the list to the article just as it is?
No, we first need to address the issues on concepts I just filed. Then we select all languages with > 150 concepts, so we keep Abipon, etc.
No, wait. I'm looking at the other issues.
Would be a pity. But with 17 concepts, like one of the languages, we cannot really work.
I'll have to go to bed now. But I'll pursue tomorrow. The good thing is: we have one more automatic step for checking now :)
Ok, till tomorrow. I can't find the GBIF ID for issue #7, but I'll keep looking.
Yes, but there is no ID. At least visible on the webpage like with the other plants.
Isn't the ID the number in the URL? 7291664
Ops. Didn't pay attention to that.
I already added that number to our file, so no worries :)
Now all forms for younger brother are added in Tokens. I have to complete the other columns (form and value), but I have to enter a (hopefully short) meeting. And then I'll continue
Nice, inform me, once done, and I'll then re-run the analysis. So we can advance the study already today, as I am now almost done with the paper draft!
'younger brother' is now complete. Checked, segmented and all. I'll proceed now to correct the sources on the document.
Running the code now!
@Bridnicolas, this looks fine now, and we have 324 concepts now. I will need to make the language statistics again later, as I have them at home, and forgot to synchronize before going to the office. But we may be able to submit this study on Monday then.
That's great.
No, wait wait. What is "iyówuj'wa". On the list. It's repeated. Chorote Iyowujwa.
What List?
Oh dear, so can you please check the file etc/languages.tsv
, what is happening there? We must have two entries then!
Ah, yes, check that list: the ID says "Manjuy", so the name is probably different, right? Can you fix that? Then I re-run, correcting is no problem, we make a release version 0.2 then!
Yes. Done. It's correct now.
Oh. You modifed the ID as well, but the ID should not be touched!
Are you sure? Not right now. I only changed Manjui, and Iyówuj'wa for Iyo'wujwa' Chorote, both under the Name column.
Okay, I'll check again.
You modified the geo-coordinates. They are wrong for Cerriteno and TobaPilaga now, for this reason, compilation fails. Can you check please?
Yes, now I see. Can't imagine how that happened. I'll fix it now
Done
Works now
I should've run this earlier, but now I did it and we need to revise our account on the data:
The last line in the following table is the proportion of concepts attested from teh overall number of 324 concepts: