phoible / dev

PHOIBLE data and development.
https://phoible.org/
GNU General Public License v3.0
121 stars 31 forks source link

Revisit marginal status #243

Open bambooforest opened 5 years ago

bambooforest commented 5 years ago

I'd like to revisit marginal status for each source. For example, we do have info on borrowing in SPA, although we don't add it:

https://github.com/phoible/dev/blob/master/scripts/aggregate-raw-data.R#L143-L144

and hence we could potentially infer marginal/borrowed segments in UPSID when the sources overlap (but we don't have to). See also issue #230 .

For UPSID we denote marginal with the anomalous flag in their raw data:

https://github.com/phoible/dev/blob/master/scripts/aggregate-raw-data.R#L158

but my understanding is that anomalous is a flag for when a segment occurred only once in the database. By that account I don't think we should mark them marginal.

In the other sources, we might also revisit whether an inventory should be marked all FALSE, e.g.

bambooforest commented 4 years ago

Browsing through the raw AA, I noticed that there are cases noted as marginal for segments only found in borrowed words

https://github.com/phoible/dev/blob/master/raw-data/AA/AA_inventories.tsv#L465

and less clear cases, such as:

https://github.com/phoible/dev/blob/master/raw-data/AA/AA_inventories.tsv#L4510

The issue of marginality is a gradient one, for example:

Jelaska, Z. and Machata, M. G. (2005). Prototypicality and the Concept Phoneme. Glossos, 6:1–13.

Should we consider marking known borrowings in addition to marginality? This could be a long term goal.