numeralbank / numeralbank-analysed

Analyses for the Numeralbank launch paper
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

What other prototypes do we need for numeral systems? #10

Open LinguList opened 2 years ago

LinguList commented 2 years ago

WE have now:

What other types would be important? And could you explain how they work?

LEK85 commented 1 year ago

Hi @LinguList Both @Mamta-Kum and @EnockAppiahTieku are coding new data now (not merging datasets, e.g. only from numerals). In some cases, it is pretty clear what the base is (e.g. in SAND, mostly "decimal", "decimal-vigesimal", and "vigesimal") but in others such as the Atlantic-Congo from numerals, there is a lot of mixture, with evidence for base 5, 8, and 10 in the same dataset. We have two different formats so far: i) Different columns: "Is there evidence for binary/quinary/... base?" values: "YES/NO" ii) One variable "Base" , values "binary"/"quinary", etc. Can you see any advantages on the former? (It was proposed by @barlowrussell ell for his Austronesian data) Shall we conflate the data annotated in this way like e.g. "quinary-decimal", "binary-quinary-decimal" and continue by only one "Base" value? Given that the automatic base inference method gives us proportions of match with different proportions, it makes sense to clearly state the components of mixed bases.

LEK85 commented 1 year ago

I have added some rules for the prototypes in the spreadsheet: https://docs.google.com/spreadsheets/d/1BVaPeszDLoOp_3rrUjH1b5L5AT89z6OXVbedx5RcTWU/edit#gid=332762886

LinguList commented 1 year ago

In my opinion, we should be very careful to annotate too complex systems now in a text-only fashion, without annotating also the forms. But annotating forms requires to come up with a rigorous system of indicating language-internal cognates. We can test this on SAND and some languages you choose, but we cannot use that for the planned paper. So I'd suggest, given the urgency, that you flag systems that you have problems with as a human in some way, e.g., saying "mixed" whatever. and we get back to them later, when we have decided on an annotation system that provides an annotation of the numeral relations on the basis of the forms.

LinguList commented 1 year ago

But this would only be done AFTER the paper has been finished, and data flagged in such a way would NOT be used now.

LEK85 commented 1 year ago

I was referring to the tab "relations" for training the base inference algorithm, as you asked some time ago, not the annotations for particular languages in other tabs. Sorry if I was not clear.