Open DanielSWolf opened 2 years ago
@DanielSWolf -- indeed. This is a problem that we are aware of (hence the "should"). The problem is also pervasive.
library(tidyverse)
df <- read_csv(url('https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'))
df <- df %>% select(7:48, -Allophones, -Source, -Marginal, -SegmentClass) %>% distinct()
df <- df %>% remove_rownames %>% column_to_rownames(var="Phoneme")
df <- df %>% filter(tone != "+")
df <- rownames_to_column(df, "Phoneme")
out <- df %>% group_by(tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayedRelease, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retractedTongueRoot, advancedTongueRoot, periodicGlottalSource, epilaryngealSource, spreadGlottis, constrictedGlottis, fortis, raisedLarynxEjective, loweredLarynxImplosive, click) %>%
summarize(phonemes = paste0(Phoneme, collapse = ', '), count = n()) %>% ungroup()
out %>% select(phonemes, count) %>% filter(count > 1) %>% arrange(desc(count))
phonemes count
<chr> <int>
1 t, t͉, t̠, t̺, t̟, d̥, t̪̺, d̺̥, t̺͉ 9
2 t̻s̻, t̪s̪, ts̪, t̪s, t̪̻s̪̻, t̟ʃ̟, ts̻, t̻s̪̻ 8
3 t̠ʃ, t̠ʃ͉, t̠͉ʃ, d̥ʒ̥, t̻ʃ̻, d̥ʒ̊, ʈ̻ʂ̻ 7
4 ts, t͉s, t̺s̺, t̟s̟, d̥z̥, d̺̥z̺̥, ts̺ 7
5 d̻z̻, d̪z̪, dz̪, d̪ʒ, d̟ʒ̟, d̪z, dz̻ 7
6 ʃ, ʃ͉, ʒ̊, s̠, s̺̠, s̻̠, ʂ̻ 7
There are several reasons for this, including but probably not limited to:
@drammock anything else?
We should make this clearer in the FAQ and on the FEATURES page.
In component-feature-table.csv, the segments ə (mid central vowel) and ɜ (open-mid central unrounded vowel) have the exact same features:
I assume that's not on purpose, given that the FAQ states that "if two phonemes differ in their graphemic representation, then they should necessarily differ in their featural representation as well".