Closed LinguList closed 1 week ago
Subgroup | Delta | STD |
---|---|---|
all | 0.23 | 0.03 |
Balto-Slavic | 0.35 | 0.04 |
Celtic | 0.20 | 0.07 |
Germanic | 0.26 | 0.04 |
Indo-Aryan | 0.27 | 0.03 |
Romance | 0.35 | 0.03 |
Subgroup | Delta | STD |
---|---|---|
all | 0.26 | 0.04 |
Kiranti | 0.38 | 0.05 |
Kuki-Chin | 0.10 | 0.00 |
Sinitic | 0.32 | 0.07 |
Tani-Yidu | 0.02 | 0.00 |
Tibeto-Dulong | 0.16 | 0.04 |
Subgroup | Delta | STD |
---|---|---|
all | 0.27 | 0.04 |
South Dravidian | 0.36 | 0.05 |
Subgroup | Delta | STD |
---|---|---|
all | 0.14 | 0.02 |
Japonic | 0.24 | 0.04 |
Koreanic | 0.24 | 0.05 |
Mongolic | 0.32 | 0.04 |
Tungusic | 0.27 | 0.03 |
Turkic | 0.34 | 0.03 |
My suspicion could be confirmed:
@SimonGreenhill, do you think this explanation makes sense? Can we test this in any way?
Code is in scripts/deep.py
!
Looks good, can you print out N too (i.e. how many languages in each group?)
Also, can you compare github.com/lexibank/oskolskayatungusic/ to Altaic:Tungusic, and github.com/lexibank/savelyevturkic/ to Altaic:Turkic? This is a direct test of 4 as Osk. and Sav. are the datasets that are used in the Altaic dataset... If you wanted an exact test then you could prune the altaic:tungusic vs. osk and altaic:turkic vs sav. datasets to have the exact same languages.
(You could also do github.com/lexibank/leejaponic vs Japonic and github.com/lexibank/leekorean vs Korean, but these are from different authors, so not as telling.)
Nexus files will be in phlorest
if you'd prefer them.
Hmm. if you really wanted to dig into this, you could compare counts on the patterns e.g. what patterns in Altaic:Tungusic were removed from Oskolskaya:Tungusic. But this might be a lot of work for little gain.
Yep, I'd do the checking of subgroups in a different thread, but adding subgroup sizes (some were excluded as they had no quartets) is also important.
Subgroup | Delta | STD | Size |
---|---|---|---|
all | 0.23 | 0.03 | 94 |
Balto-Slavic | 0.35 | 0.04 | 16 |
Celtic | 0.20 | 0.07 | 6 |
Germanic | 0.26 | 0.04 | 17 |
Indo-Aryan | 0.27 | 0.03 | 30 |
Romance | 0.35 | 0.03 | 15 |
Subgroup | Delta | STD | Size |
---|---|---|---|
all | 0.26 | 0.04 | 50 |
Kiranti | 0.38 | 0.05 | 7 |
Kuki-Chin | 0.10 | 0.00 | 4 |
Sinitic | 0.32 | 0.07 | 7 |
Tani-Yidu | 0.02 | 0.00 | 4 |
Tibeto-Dulong | 0.16 | 0.04 | 21 |
Subgroup | Delta | STD | Size |
---|---|---|---|
all | 0.27 | 0.04 | 20 |
South Dravidian | 0.36 | 0.05 | 11 |
Subgroup | Delta | STD | Size |
---|---|---|---|
all | 0.14 | 0.02 | 101 |
Japonic | 0.24 | 0.04 | 16 |
Koreanic | 0.24 | 0.05 | 16 |
Mongolic | 0.32 | 0.04 | 15 |
Tungusic | 0.27 | 0.03 | 22 |
Turkic | 0.34 | 0.03 | 32 |
Following up on the detection by @SimonGreenhill with delta scores, we can easily test this without referring to any other dataset by:
If we find a huge discrepancy, this is a hint (in my opinion) that the data was collected in a bottom-up fashion by considering only proto-forms across subgroups, instead of searching all against all for cognate residues.