Open LinguList opened 2 years ago
Hi @LinguList Both @Mamta-Kum and @EnockAppiahTieku are coding new data now (not merging datasets, e.g. only from numerals). In some cases, it is pretty clear what the base is (e.g. in SAND, mostly "decimal", "decimal-vigesimal", and "vigesimal") but in others such as the Atlantic-Congo from numerals, there is a lot of mixture, with evidence for base 5, 8, and 10 in the same dataset. We have two different formats so far: i) Different columns: "Is there evidence for binary/quinary/... base?" values: "YES/NO" ii) One variable "Base" , values "binary"/"quinary", etc. Can you see any advantages on the former? (It was proposed by @barlowrussell ell for his Austronesian data) Shall we conflate the data annotated in this way like e.g. "quinary-decimal", "binary-quinary-decimal" and continue by only one "Base" value? Given that the automatic base inference method gives us proportions of match with different proportions, it makes sense to clearly state the components of mixed bases.
I have added some rules for the prototypes in the spreadsheet: https://docs.google.com/spreadsheets/d/1BVaPeszDLoOp_3rrUjH1b5L5AT89z6OXVbedx5RcTWU/edit#gid=332762886
In my opinion, we should be very careful to annotate too complex systems now in a text-only fashion, without annotating also the forms. But annotating forms requires to come up with a rigorous system of indicating language-internal cognates. We can test this on SAND and some languages you choose, but we cannot use that for the planned paper. So I'd suggest, given the urgency, that you flag systems that you have problems with as a human in some way, e.g., saying "mixed" whatever. and we get back to them later, when we have decided on an annotation system that provides an annotation of the numeral relations on the basis of the forms.
But this would only be done AFTER the paper has been finished, and data flagged in such a way would NOT be used now.
I was referring to the tab "relations" for training the base inference algorithm, as you asked some time ago, not the annotations for particular languages in other tabs. Sorry if I was not clear.
WE have now:
What other types would be important? And could you explain how they work?