I've noticed that there is no data for either Macroarea or Family for 49 languages in the numerals dataset.
I checked them, and I unsurprisingly found that these are those language without a glottocode (or with one starting with xxxx).
I went through all of them, and managed to add some further data in the attached file (maximally: Glottocode, Glottolog_Name, ISO639P3code, Macroarea, Family, and added EZE_COMMENT —see below)
For all of them, I added the field Macroarea (the country is reported, and I think that I consistently assigned them to the corresponding macroarea (e.g. Indonesia, East Timor, and the Philippines in Papunesia, not in Asia, etc. )
For some of them, I found a glottocode straightforwardly (e.g. Caijia caji1234)
For some others, I found them to be a dialect according to Glottolog, so I marked them with their dialect glottocode in case there are coordinates for the dialect (e.g. rang1267) or with the main language glottocode if dialect coordinates were unavailable. (e.g. patt1247 dialect of tami1289)
Others appear as Bookkeeping. In these cases, I added the (obsolete) glottocode. I’m not sure if this makes sense, feel free to delete these glottocodes.
For most of the above, Latitude and Longitude are available from Glottolog, but I didn’t fill this, keeping instead NA as originally in the dataset (I'll let someone else to do this in an automatic and error-free way).
For others languages, I couldn’t find any entry in Glottolog (therefore, no glottocode added) but I found the Family in either Ethnologue, the “Comment” section by the contributor, or, worst case scenario, English Wikipedia.
Finally, there are a few for which I didn’t find any clue of their Family, so I left the NA.
I added the column “EZE_COMMENT” explaining the source for each datapoint.
Hi,
I've noticed that there is no data for either Macroarea or Family for 49 languages in the numerals dataset. I checked them, and I unsurprisingly found that these are those language without a glottocode (or with one starting with xxxx).
I went through all of them, and managed to add some further data in the attached file (maximally: Glottocode, Glottolog_Name, ISO639P3code, Macroarea, Family, and added EZE_COMMENT —see below)
For all of them, I added the field Macroarea (the country is reported, and I think that I consistently assigned them to the corresponding macroarea (e.g. Indonesia, East Timor, and the Philippines in Papunesia, not in Asia, etc. )
For some of them, I found a glottocode straightforwardly (e.g. Caijia caji1234)
For some others, I found them to be a dialect according to Glottolog, so I marked them with their dialect glottocode in case there are coordinates for the dialect (e.g. rang1267) or with the main language glottocode if dialect coordinates were unavailable. (e.g. patt1247 dialect of tami1289)
Others appear as Bookkeeping. In these cases, I added the (obsolete) glottocode. I’m not sure if this makes sense, feel free to delete these glottocodes.
For most of the above, Latitude and Longitude are available from Glottolog, but I didn’t fill this, keeping instead NA as originally in the dataset (I'll let someone else to do this in an automatic and error-free way).
For others languages, I couldn’t find any entry in Glottolog (therefore, no glottocode added) but I found the Family in either Ethnologue, the “Comment” section by the contributor, or, worst case scenario, English Wikipedia.
Finally, there are a few for which I didn’t find any clue of their Family, so I left the NA.
I added the column “EZE_COMMENT” explaining the source for each datapoint.
I hope this helps.
Best!
Ezequiel
no.glottocode.csv