Open bambooforest opened 5 years ago
Browsing through the raw AA, I noticed that there are cases noted as marginal for segments only found in borrowed words
https://github.com/phoible/dev/blob/master/raw-data/AA/AA_inventories.tsv#L465
and less clear cases, such as:
https://github.com/phoible/dev/blob/master/raw-data/AA/AA_inventories.tsv#L4510
The issue of marginality is a gradient one, for example:
Jelaska, Z. and Machata, M. G. (2005). Prototypicality and the Concept Phoneme. Glossos, 6:1–13.
Should we consider marking known borrowings in addition to marginality? This could be a long term goal.
I'd like to revisit marginal status for each source. For example, we do have info on borrowing in SPA, although we don't add it:
https://github.com/phoible/dev/blob/master/scripts/aggregate-raw-data.R#L143-L144
and hence we could potentially infer marginal/borrowed segments in UPSID when the sources overlap (but we don't have to). See also issue #230 .
For UPSID we denote marginal with the
anomalous
flag in their raw data:https://github.com/phoible/dev/blob/master/scripts/aggregate-raw-data.R#L158
but my understanding is that
anomalous
is a flag for when a segment occurred only once in the database. By that account I don't think we should mark them marginal.In the other sources, we might also revisit whether an inventory should be marked all FALSE, e.g.