ISchema as a shortcut for similar orthographies

LinguList commented 7 years ago

Lingpy distinguishes "schemas" for sound classes, including:

1) one routine for segmentation 2) one routine for conversion to sound classes (and a default sound class model) 3) one default routine for the scoring function in alignments

Currently, lingpy has two schemas: "ipa" and "asjp", the latter working on ASJP alphabet.

We should add an additional schema in lingpy3, and the possibility to register new schemas by the user:

1) plain ipa (assuming that orthogrpaphy is more or less regular IPA) 2) fuzzy ipa (assuming a messy IPA, with aspiration not written as superscript, etc., requiring a segmentation function based on a clean_string strategy) 3) asjp

More schemas are possible, for example "starling", as the whole data of Tower of Babel is in their own IPA version. The main argument for schemas is that it is too time-consuming to write individual orthography-profiles for all datasets, while on the other hand, many datasets are consistent enough to allow to be analysed by an enhanced function that is simpler than a full-fledged orthography profile.

SimonGreenhill commented 7 years ago

a sensible 'broad phonemic' schema would be great too.

LinguList commented 7 years ago

I'd assume that we could cover this more or less in "fuzzy" ipa, as this schema will cover cases like:

thoxther > th o x th e r

And phonemic transcriptions are usually much more lazy regarding writing of strange unicode characters than other ones. Or do you have specific other cases in mind?

lingpy / lingpy3

ISchema as a shortcut for similar orthographies #18