Closed LinguList closed 1 week ago
@HansonMenghan, @SimonGreenhill, @RustyGray, @robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages.
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!!
BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/8/2021 18:25,Johann-Mattis @.***> wrote:
@HansonMenghan, @SimonGreenhill, @RustyGray, @robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
So given that this experiment I presented here proves that low delta scores reflect low coding for cognates across subgroups, we can add this experiment to our reply!
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan, @SimonGreenhill, @RustyGray, @robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
Good find Mattis! That is indeed striking. This processing of the data will bias not only the estimator of the age of the root to the whole Altaic (putative) family, but also the estimator of the age of each subfamily, and any phylogeographical estimators.
On Wed, 8 Dec 2021 at 11:44, Johann-Mattis List @.***> wrote:
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988699738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCVHDZ65O3QQIHXXNJBWTUP4ZKPANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
The process of data doesn't affect the structure of phylogeny because Bayesian approach allow the low noise underlying the cognate data, but enlarge the time estimation of the root of the TransEA languages.
Menghan
On 12/8/2021 18:49,Robin @.***> wrote:
Good find Mattis! That is indeed striking. This processing of the data will bias not only the estimator of the age of the root to the whole Altaic (putative) family, but also the estimator of the age of each subfamily, and any phylogeographical estimators.
On Wed, 8 Dec 2021 at 11:44, Johann-Mattis List @.***> wrote:
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988699738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCVHDZ65O3QQIHXXNJBWTUP4ZKPANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
The process of data doesn't affect the structure of phylogeny because Bayesian approach allow the low noise underlying the cognate data, but enlarge the variance of initial divergence time of the TransEA languages.
Menghan
On 12/8/2021 18:49,Robin @.***> wrote:
Good find Mattis! That is indeed striking. This processing of the data will bias not only the estimator of the age of the root to the whole Altaic (putative) family, but also the estimator of the age of each subfamily, and any phylogeographical estimators.
On Wed, 8 Dec 2021 at 11:44, Johann-Mattis List @.***> wrote:
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988699738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCVHDZ65O3QQIHXXNJBWTUP4ZKPANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
I think there is no horizontal language borrowing in Robbeets' data due to her own cognate identification. That is why the relationships among TransEA languages are tree-likeness.
On 12/8/2021 18:45,Johann-Mattis @.***> wrote:
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
Something like this?
import xml.etree.ElementTree as ET
tree = ET.parse("tea254pdcov-ucln-fbd-constrained.xml")
root = tree.getroot()
for neighbor in root.find("run"):
for data in neighbor.iter("distribution"):
if "treeLikelihood" in data.attrib["id"]:
print(data.attrib["id"].split("treeLikelihood.")[-1])
fire
nose
go
water
mouth
tongue
blood
bone
2SG
root
come
breast
rain
1SG
name
louse
wing
meat
arm
fly
night
ear
neck
far
do
house
stone
bitter
say
tooth
hair
big
one
who?
3SG
hit
leg
horn
this
fish
yesterday
drink
black
navel
stand
bite
back
wind
smoke
what?
child
egg
give
new
burn
not
good
know
knee
sand
laugh
hear
soil
leaf
red
liver
hide
skin
suck
carry
ant
heavy
take
old
eat
thigh
thick
long
blow
wood
run
fall
eye
ash
tail
dog
cry
tie
see
sweet
rope
shade
bird
salt
small
wide
star
in
hard
crush
mountain
sit
fingernail
throw
three
right
wash
grasp
branch
man
raw
tomorrow
two
bottom
lie
snake
cloud
year
tear
ask
weave
at
edge
chin
play
cheek
pus
hole
grow
head
belly
shoulder
claw
which?
dig
pull
hot
firewood
remain
cold
feather
cough
thin
grass
foam
sour
full
day
sleep
month
white
sew
kill
jump
throat
woods
there
find
flow
many
chew
swallow
wet
four
soft
look
nasal
that
cut
mother
scratch
sun
brain
warm
cover
woman
deep
above
female
put
other
forehead
left
rise
dry
how?
break
where?
spin
ripe
lick
open
tall
bad
bark
breathe
count
die
dirty
dust
fat
father
fear
fight
five
flower
fog
freeze
fruit
green
guts
heart
here
hunt
ice
lake
live
moon
narrow
near
person
push
river
to
round
sea
seed
sharp
short
sing
sky
smell
smooth
snow
spit
stick
straight
swell
swim
they
think
tree
true
turn
vomit
walk
1PL
when?
with
worm
yellow
2PL
Robbeets did identify some obvious borrowings in her dataset (they appear with "bor" in edictor), but apparently did not exclude them.
Le mer. 8 déc. 2021 à 12:41, Menghan Zhang @.***> a écrit :
I think there is no horizontal language borrowing in Robbeets' data due to her own cognate identification. That is why the relationships among TransEA languages are tree-likeness.
On 12/8/2021 18:45,Johann-Mattis @.***> wrote:
Is that not due to excluded borrowings? As they did this manually somehow, from excel to the nexus file, there was surely some danger for tinkering. Do we have the numbers for the discrepancies? Can we identify meaning slots?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988738678, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOA4MEAVCXYIY6R5RQ7TUP475TANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
The supplementary Excel table of lexical cognates could be better for further analysis.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/8/2021 19:48,Guillaume @.***> wrote:
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
The supplementary files in pre-print version seems same as those in the final version published in Nature.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/8/2021 19:48,Guillaume @.***> wrote:
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Well, the Excel file is what we NEED to confirm sound correspondences. It was our core work, and we know that they have used this Excel file to create their nexus file. @chrzyki, if you can show me how to bring the cognate sets with meaning slot annotations into a presence-absence-matrix, I can check this against the data in the Excel sheet. What I would need is:
concept cognateset language1 language2 language3 etc. hand 1 1 0 0 Ø
Ø would indicate missing data (or use - etc.)
Ideally, you could push the official XML file to raw/ and also add this
code in scripts
. I'd then compare with the data from the Excel file.
This adds most of the relevant things we currently have:
https://github.com/lexibank/robbeetsaltaic/commit/a04df0176c37282ded1d5b0e736080a5a75d9ab2
While adding that I noticed that the kml file on figshare was actually pruned of content for 'security reasons'? See here: https://figshare.com/s/b9c67ca3ea47faf51d48
Well, the Excel file is what we NEED to confirm sound correspondences. It was our core work, and we know that they have used this Excel file to create their nexus file. @chrzyki, if you can show me how to bring the cognate sets with meaning slot annotations into a presence-absence-matrix, I can check this against the data in the Excel sheet. What I would need is: concept cognateset language1 language2 language3 etc. hand 1 1 0 0 Ø Ø would indicate missing data (or use - etc.) Ideally, you could push the official XML file to raw/ and also add this code in
scripts
. I'd then compare with the data from the Excel file.
I'll have a look at how to extract that language-wise, I'm not sure yet.
Yes, this is important because they did correct the files. Recall Simon found that there were too many all zero columns. I alerted Remco to this (it seemed the right thing to do), and they fixed that problem. Cheers, Russell.
On 8. Dec 2021, at 12:48, Guillaume Jacques @.**@.>> wrote:
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988743290, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEETOPHIA5OU7ITYEJHQZD3UP5AZ7ANCNFSM5JTNWPNQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
No, not quite. They deleted a lot of all zero columns (not the ones needed for the ascertainment bias correction). Cheers, Russelll.
Russell Gray Director, Max Planck Institute for Evolutionary Anthropology Head of the Department of Linguistic and Cultural Evolution TEL: +49-3641-68 68 01 FAX: +49-3641-68 68 68 Departmental Administrators: Jena @. Leipzig @. http://www.shh.mpg.de/2375/en http://language.psy.auckland.ac.nz/ https://scholar.google.com/citations?hl=en&user=sksPd1cAAAAJ
On 8. Dec 2021, at 13:13, Menghan Zhang @.***> wrote:
The supplementary files in pre-print version seems same as those in the final version published in Nature.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/8/2021 19:48,Guillaume @.***> wrote:
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988759504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEETOPCH6YRQ42FU3H5NSATUP5DYNANCNFSM5JTNWPNQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Ok, so the difference is the columns of ascertainment bias correction. Thanks for this important message.
Best,
Menghan
On 12/8/2021 @.***> wrote:
No, not quite. They deleted a lot of all zero columns (not the ones needed for the ascertainment bias correction). Cheers, Russelll.
Russell Gray Director, Max Planck Institute for Evolutionary Anthropology Head of the Department of Linguistic and Cultural Evolution TEL: +49-3641-68 68 01 FAX: +49-3641-68 68 68 Departmental Administrators: Jena @. Leipzig @. http://www.shh.mpg.de/2375/en http://language.psy.auckland.ac.nz/ https://scholar.google.com/citations?hl=en&user=sksPd1cAAAAJ
On 8. Dec 2021, at 13:13, Menghan Zhang @.***> wrote:
The supplementary files in pre-print version seems same as those in the final version published in Nature.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/8/2021 19:48,Guillaume @.***> wrote:
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988759504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEETOPCH6YRQ42FU3H5NSATUP5DYNANCNFSM5JTNWPNQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
To be clear, some all zero columns are needed for the ascertainment bias correction, but they had far too many in the preprint version (Simon spotted this). Probably because they had deleted some languages with other cognate sets. In a parsimony world this makes no difference. With likelihood it does. It will bias the rate and therefore the date estimates. So I alerted Remco as a collegial thing to do and I think he fixed it. I haven’t checked this myself but I assume it is the case. If not they have even more egg on their face. Cheers, Russell.
On 8. Dec 2021, at 16:17, Menghan Zhang @.**@.>> wrote:
Ok, so the difference is the columns of ascertainment bias correction. Thanks for this important message.
Best,
Menghan
On 12/8/2021 @.***> wrote:
No, not quite. They deleted a lot of all zero columns (not the ones needed for the ascertainment bias correction). Cheers, Russelll.
Russell Gray Director, Max Planck Institute for Evolutionary Anthropology Head of the Department of Linguistic and Cultural Evolution TEL: +49-3641-68 68 01 FAX: +49-3641-68 68 68 Departmental Administrators: Jena @. Leipzig @. http://www.shh.mpg.de/2375/en http://language.psy.auckland.ac.nz/ https://scholar.google.com/citations?hl=en&user=sksPd1cAAAAJ
On 8. Dec 2021, at 13:13, Menghan Zhang @.***> wrote:
The supplementary files in pre-print version seems same as those in the final version published in Nature.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/8/2021 19:48,Guillaume @.***> wrote:
If the number of cognates sets is different, we need to make sure indeed that we use the published version of the data, because we started this project with the pre-print files.
Le mer. 8 déc. 2021 à 11:40, Christoph Rzymski @.***> a écrit :
Yes. In our current version of commentary, we list the different delta scores in IE, ST etc.,in which the score in TransEA is strikingly lowest. It is very very unreasonable!!!! BTW, I find the number of cognate sets in the XML files of language different from the one reported in the Robbeets manuscript. Best Regards 张梦翰 博士 Menghan Zhang, Ph.D. Tel:13901887242 @. 现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology 生命科学学院,复旦大学 School of Life Sciences, Fudan University 地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China On 12/8/2021 18:25,Johann-Mattis @.> wrote: @HansonMenghan https://github.com/HansonMenghan, @SimonGreenhill https://github.com/SimonGreenhill, @RustyGray https://github.com/RustyGray, @robinryder https://github.com/robinryder, do you agree that these numbers look striking? If so, it confirms, that cross-subgroup cognates were set to a minimum in the study by Robbeets et al. The tree-likeness is a fake tree-likeness, not real, resulting from a bias in the Delta Scores when cognate sets are sparse across distantly related languages. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Ah, thanks that you're mentioning the different number of cognate sets. That also confused me and there is definitely some discrepancy with what is reported in the article and what is actually in the XML files.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988696418, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPJOAZRKMHNSU3MGB2AOT3UP4YZJANCNFSM5JTNWPNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Guillaume Jacques
Directeur de recherches CNRS (CRLAO) - EPHE- INALCO https://scholar.google.fr/citations?user=1XCp2-oAAAAJ&hl=fr https://langsci-press.org/catalog/book/295 http://cnrs.academia.edu/GuillaumeJacques http://panchr.hypotheses.org/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988759504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEETOPCH6YRQ42FU3H5NSATUP5DYNANCNFSM5JTNWPNQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/lexibank/robbeetsaltaic/issues/13#issuecomment-988905417, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEETOPERDRE4UM3T3DKAMATUP5ZKBANCNFSM5JTNWPNQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi Mattis,
Shall we talk about the linguistic part on the Robbeets data from the computational perspective, and how to advance it? like the core problems in the linguistic part of her work, the views we want to show in our commentary finally, and what quantitative approach we can adopt to support our views.
Best
Menghan
On 12/8/2021 20:35,Johann-Mattis @.***> wrote:
Well, the Excel file is what we NEED to confirm sound correspondences. It was our core work, and we know that they have used this Excel file to create their nexus file. @chrzyki, if you can show me how to bring the cognate sets with meaning slot annotations into a presence-absence-matrix, I can check this against the data in the Excel sheet. What I would need is:
concept cognateset language1 language2 language3 etc. hand 1 1 0 0 Ø
Ø would indicate missing data (or use - etc.)
Ideally, you could push the official XML file to raw/ and also add this
code in scripts
. I'd then compare with the data from the Excel file.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Yes. I think we should start by unifying our data, so we can make
comparative plots. The scripts I use are in the folder scripts
, but
there is no documentation, maybe, we start a new github repository only
devoted to the analysis? We would add the different datasets there, so
we can properly analyze them. I assume you also use Python, right?
Should I make the repository and invite you there?
I just invited you to a new repository. We should invite all the others there too, will do so later.
Sorry for reply late. We can unifying our data first, and make the cross-linguistic comparative plots. I used the R and Matlab program for data analysis where Matlab is more familiar to me. And, can you add my graduate student, Yuxin Tao into the github, who is member of my group and has reanalyzed Robbeets data.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/10/2021 13:49,Johann-Mattis @.***> wrote:
Yes. I think we should start by unifying our data, so we can make
comparative plots. The scripts I use are in the folder scripts
, but
there is no documentation, maybe, we start a new github repository only
devoted to the analysis? We would add the different datasets there, so
we can properly analyze them. I assume you also use Python, right?
Should I make the repository and invite you there?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Github ID for Yuxin Tao is thooofiy, his email is @.***
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/10/2021 @.***> wrote: Sorry for reply late. We can unifying our data first, and make the cross-linguistic comparative plots. I used the R and Matlab program for data analysis where Matlab is more familiar to me. And, can you add my graduate student, Yuxin Tao into the github, who is member of my group and has reanalyzed Robbeets data.
Best Regards
张梦翰 博士 Menghan Zhang, Ph.D.
Tel:13901887242
@.***
现代人类学教育部重点实验室 MOE Key Laboratory of Contemporary Anthropology
生命科学学院,复旦大学 School of Life Sciences, Fudan University
地址:上海市杨浦区淞沪路2005号复旦大学江湾校区生命科学学院B603 Address: Room B603, No 2005, Songhu Rd, School of Life Sciences, Jiang Wan Campus, Fudan University, Yangpu District, Shanghai, China
On 12/10/2021 13:49,Johann-Mattis @.***> wrote:
Yes. I think we should start by unifying our data, so we can make
comparative plots. The scripts I use are in the folder scripts
, but
there is no documentation, maybe, we start a new github repository only
devoted to the analysis? We would add the different datasets there, so
we can properly analyze them. I assume you also use Python, right?
Should I make the repository and invite you there?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
I will add him, and you have also admin rights now, so you can also add him. For the email, best write an email to us three, so we can discuss, since github does not seem to allow to share emails ;)
I think I found the reason for the low delta scores!
The approach is very straightforward: Iterate over all cognate sets, only retain THAT cognate set in a subgroup for a concept that is the MOST FREQUENT in the subgroup to be retained for comparison with OUTSIDE subgroups.
The result is below (DELTA Scores!)