Feature vectors for allophones that aren't phonemes

phoible / dev

PHOIBLE data and development.

https://phoible.org/

GNU General Public License v3.0

121 stars 30 forks source link

Feature vectors for allophones that aren't phonemes #368

Open lggruspe opened 1 year ago

lggruspe commented 1 year ago

Some segments appear in the PHOIBLE data as allophones, but not as phonemes in any language.

Examples:

tʃː is an allophone for t̠ʃ in kuna1268
tʂ is an allophone for t̠ʃ in yuch1247
tʂʼ is an allophone for t̠ʃʼ in yuch1247

phoible.csv doesn't seem to have feature vectors for these allophones.

drammock commented 1 year ago

That's correct. We have a student working on this right now. But we're not sure yet how to provide them; they can't be part of phoible.csv because it has one row per phoneme (not one per allophone). Can you tell us about your use case / what would be the best format from your perspective?

lggruspe commented 1 year ago

I was only looking to compare the features of tʂ with ʈʂ. Phoible uses both symbols (possibly to represent different sounds), but Wikipedia says they represent the same sound.

drammock commented 1 year ago

tʂ looks like a mistake to me; we try to enforce that affricates have place-matching between the stop part and the fricative part. Such mistakes are more likely in the allophones because they aren't run through the same validation code that the phonemes are; though as I said we have a student working on this right now so hopefully soon many of these allophone errors will get corrected.

cc @Alessioryan

Alessioryan commented 1 year ago

@drammock Would you be able to send me the validation code for the phonemes? I'd love to take a look at this issue, I hadn't noticed it prior.