lexibank / lsi

CLDF dataset derived from Grierson's "Linguistic Survey of India" from 1928
https://lsi.clld.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Family Trees and evaluation #17

Open PhyloStar opened 4 years ago

PhyloStar commented 4 years ago

I looked at the neighbornets for the different families. The reticulation scores are within the range of 0.29-0.35 which is an indicator of tree signal. Here are my observations:

Next steps (may be) to infer trees,

Finally, show the results and write up the paper. :)

LinguList commented 4 years ago

Yes, this is advancing well. I think, we may even do without the neighbor nets. If we have a nice CLLD app, this may even be more useful for now, we are currently discussing how this could be done with @xrotwang (one feature we discuss is how to represent sound inventories).

PhyloStar commented 4 years ago

Really cool! If both languages and phoneme inventories can be included in one map it would be really cool.

Do you want me to go ahead and set up MrBayes analysis? I think it is independent of the clld app, right? We will have the results in 2 days or so and then we can evaluate against Glottolog trees...

LinguList commented 4 years ago

Yes, why not? I'll try and check the code done so far during this week, so we have some review by two people here, which is probably useful.

PhyloStar commented 4 years ago

Done setting up the MrBayes analysis. We can expect results for larger families within a day. Generating the reference tree is remaining. Hope to get that done soonish and we can submit the paper by the end of May, may be.

PhyloStar commented 4 years ago

Yes, this is advancing well. I think, we may even do without the neighbor nets. If we have a nice CLLD app, this may even be more useful for now, we are currently discussing how this could be done with @xrotwang (one feature we discuss is how to represent sound inventories).

@lingulist Is this app web based or related to Python? Is there anything I can do? I will try to do my best.

LinguList commented 4 years ago

https://clld.org/2020/05/07/update.html

This is a recent post, showing, how you can use CLDF to make an app that you can browse. We still discuss the representation of phoneme inventories, but the other aspects of showing words in space, etc. are already there.

You could thus give it a try and see if it works.

xrotwang commented 4 years ago

@PhyloStar what do you mean by "languages and phoneme inventories in one map"? Something similar to PHOIBLE would be pretty straightforward, i.e. charts like here and absence/presence maps per phoneme à la https://phoible.org/parameters/1507B68E1E3108371C1F882C40902AA5#1/26/155

PhyloStar commented 4 years ago

@xrotwang I see. Thanks. I guess @lingulist had a different idea about showing phoneme inventories. I am somewhat confused here.

PhyloStar commented 4 years ago

@xrotwang I had a chat with @lingulist and a CLLD app like PHOIBLE would be really cool to have. How do we start off?

PhyloStar commented 4 years ago

I started to work from here: https://clld.org/2020/05/07/update.html

I am in the virtualenv created for cldfying lsi. I tried to do the 4th step and didn't succeed. clld initdb development.ini --cldf PATH/TO/cldf/petersonsouthasia/cldf/StructureDataset-metadata.json --glottolog PATH/TO/glottolog/glottolog

Obviously, I require the StructureDataset-metadata.json file within lsi/cldf/ path. How do I create this json file? Is it through cldfbench?

LinguList commented 4 years ago

cldf/cldf-metadata.json

LinguList commented 4 years ago

It's in our lsi directory.

LinguList commented 4 years ago

The metadata file contains all the information on the content in the cldf data. It is the core of cldf.

xrotwang commented 4 years ago

Ah, for lsi the metadata is in cldf/cldf-metadata.json. @PhyloStar I can setup an initial repos at clld/lsi later today, and add you as colaborator.

PhyloStar commented 4 years ago

After running setup.py in clld_lsi, I tried the cldf/cldf-metadata.json, I get the error: KeyError: 'CodeTable'

I will wait for the initial repos. There might be something that needs to be fixed.

LinguList commented 4 years ago

Did you try and debug, tracing the error?

xrotwang commented 4 years ago

@PhyloStar you may have created the project template for the wrong CLDF module, i.e. not for a Wordlist but for a StructureDataset. In that case, any steps down the road would fail. But yes, wait for the initial repos. This functionality is rather new and may still be buggy.

PhyloStar commented 4 years ago

@xotwang: I did create template for StructureDataset. Ahh, I see. I will wait.