phoible / dev

PHOIBLE data and development.
https://phoible.org/
GNU General Public License v3.0
115 stars 30 forks source link

ə and ɜ have the exact same features #352

Open DanielSWolf opened 2 years ago

DanielSWolf commented 2 years ago

In component-feature-table.csv, the segments ə (mid central vowel) and ɜ (open-mid central unrounded vowel) have the exact same features:

ə,0259,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,-,0,0,0,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0
ɜ,025C,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,-,0,0,0,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0

I assume that's not on purpose, given that the FAQ states that "if two phonemes differ in their graphemic representation, then they should necessarily differ in their featural representation as well".

bambooforest commented 2 years ago

@DanielSWolf -- indeed. This is a problem that we are aware of (hence the "should"). The problem is also pervasive.

library(tidyverse)
df <- read_csv(url('https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'))
df <- df %>% select(7:48, -Allophones, -Source, -Marginal, -SegmentClass) %>% distinct()
df <- df %>% remove_rownames %>% column_to_rownames(var="Phoneme")
df <- df %>% filter(tone != "+")
df <- rownames_to_column(df, "Phoneme")
out <- df %>% group_by(tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayedRelease, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retractedTongueRoot, advancedTongueRoot, periodicGlottalSource, epilaryngealSource, spreadGlottis, constrictedGlottis, fortis, raisedLarynxEjective, loweredLarynxImplosive, click) %>%
  summarize(phonemes = paste0(Phoneme, collapse = ', '), count = n()) %>% ungroup()
out %>% select(phonemes, count) %>% filter(count > 1) %>% arrange(desc(count))

   phonemes                       count
   <chr>                          <int>
 1 t, t͉, t̠, t̺, t̟, d̥, t̪̺, d̺̥, t̺͉          9
 2 t̻s̻, t̪s̪, ts̪, t̪s, t̪̻s̪̻, t̟ʃ̟, ts̻, t̻s̪̻     8
 3 t̠ʃ, t̠ʃ͉, t̠͉ʃ, d̥ʒ̥, t̻ʃ̻, d̥ʒ̊, ʈ̻ʂ̻         7
 4 ts, t͉s, t̺s̺, t̟s̟, d̥z̥, d̺̥z̺̥, ts̺         7
 5 d̻z̻, d̪z̪, dz̪, d̪ʒ, d̟ʒ̟, d̪z, dz̻         7
 6 ʃ, ʃ͉, ʒ̊, s̠, s̺̠, s̻̠, ʂ̻                7

There are several reasons for this, including but probably not limited to:

@drammock anything else?

We should make this clearer in the FAQ and on the FEATURES page.