lingpy / evaluation-paper

Annotating Cognates in Phylogenetic Studies of South-East Asian Languages
GNU General Public License v3.0
1 stars 0 forks source link

Rerun Bayesian Analyses and ML Analysis with New Files #51

Closed LinguList closed 2 years ago

LinguList commented 2 years ago

I just updated all code, including nexus files, so I suggest to re-run the analyses now, as I think we had some different versions of the data. The data computed now should reflect the data on edictor and exclude borrowings as discussed.

LinguList commented 2 years ago

@macyl, you can submit the ML code, once this has been done, we may also need log files for the bayesian analyses, so we can report densitrees, etc.

LinguList commented 2 years ago

Once this is done, I will check our text.

LinguList commented 2 years ago

The text in the main draft (that is what I mean by "text").

Wu-Urbanek commented 2 years ago

Just to be sure that I understand this issue accurately.

  1. Re-run all the code
  2. Re-run Mr.Bayes
  3. Upload the R script which I made ML consensus network?
LinguList commented 2 years ago

Yes, but you do not need to rerun all the code, as my last PR already added all files, including nexus files.

Wu-Urbanek commented 2 years ago

So I finished the recomputing of Bayesian analysis. Do you want all the files or just the logs?

LinguList commented 2 years ago

As there will be many files, I'd ask you kindly to start by providing consensus trees and densitree plots. For the other files, we should discuss where to place them and how to name them.

LinguList commented 2 years ago

@MacyL, I have now updated all files, so we do not need you to run the Bayesian analyses anymore, the analyses I ran should be sufficient. What I'd ask you to do is the following:

  1. make plots with figtree
  2. investigate the files to look for larger differences, e.g., in subgroups, or in the age of subgroups. With tree ages set to 2500 BP, we are interested in larger derivations
  3. making plots with densitree is also useful
  4. any additional ideas to analyze the data are welcome

But please just use the files I now uploaded, so we do not have too many conflicts.

Wu-Urbanek commented 2 years ago

As I replied in the PR. Please elaborate:

  1. make plots with Figtree. PDF? or PNG? full or/and part. full-cognatesetid-con.tre.pdf? part-cognatesetid-con.tre.pdf? Or both. Tree age or prob? I usually output both in full-cognatesetid-con.tre.age.pdf full-cognatesetid-con.tre.prob.pdf. Do you agree with this naming system? Are these going to be put in the manuscript? If so, then 4 subfigures in one image as well?
  2. how you want me to show you the differences? I can try to provide a table for you or my observations in bullet points. Since this is not a computer-generated file, what would be the name of the file? e.g. observations.md?
  3. DensiTree is great, run1, run2 or both? Only images? Is filename full-cognatesetid.densi.pdf? Or you want me to list out my observations as well? e.g. observations-densitree.md?
  4. Do you need ML consensus network or not. The images would not be too beautiful but it is a section in my manuscript. If no, then I will drop it. If yes, what are the names for the script, and what is the name for the output files? I would not have more ideas to analyze the trees or network any further. The goal of the paper is not to provide extensive tests, it is to provide evidence to remind people to treat their data properly instead of blaming Bayesian linguists using complex algorithms and not report beautiful results. (sorry for the rude words.)
  5. what is the folder name of this update. In the root/ or Bayes/
LinguList commented 2 years ago
  1. both, all pdf, naming is okay, and yes, both are useful, one can also combine this using heatmaps inside figtree. For the MS, I am considering now to make our plot with ete in Python with these updated topologies instead of the bootstrap, so we report only bayesian analyses, what do you think?
  2. I would first like you to check what you think, and come up with some ideas, we can then discuss them
  3. Yes, if both runs converged, one is enough, I think. The data are anyway there for both.
  4. I would like to see the ML consensus network, but with the updated nexus files, as I think there was a problem in the export earlier. But we decide later if we include it or not.
  5. I would say the figures for Bayes analyses should be placed in the folder bayes/.
Wu-Urbanek commented 2 years ago
  1. Reporting only Bayesian analysis would make the manuscript more condensed, so it is a good direction. But there was a point that I wanted to bring up, which is the more complex algorithms we use, the input data needs to be more carefully prepared. Removing the bootstrap results would probably make this point harder to come across. But if our goals are to talk about (1) We encourage linguists to express their cognate decisions with a simple annotation scheme that has a very nice and beautiful visual impact. (2) We bring up the awareness of preparing cognate sets data in a careful manner for Bayesian analysis (linguists have 4 different choices, and we recommend 3 out of the 4). Then we don't need the bootstrap results. And thanks for agreeing on the file naming system.
  2. I can think about the ideas. My current observation is loose cognate sets will filter out too much information (compare with the other three).
  3. ok.
  4. ok, I will check the code and update it.
  5. understood.
Wu-Urbanek commented 2 years ago

By the way, I had a look into Figtree, I did not find the heatmap option. But the attachment shows a way to combine both age and prob. The number is the probability, the blue bar is the age 95% HPD. part-commonid-out.con.tre.annotation.pdf

LinguList commented 2 years ago

Also fine. You can also just color the edges according to probabilities (if you make them thicker), In this case, this is based on a colormap (I meant "colormap", that is a transient between colors).

LinguList commented 2 years ago

What you say reg 1 is more or less what I'd suggest.

Wu-Urbanek commented 2 years ago

Edge (branch) is colored by the probability (prob) and the node shows the age. Blue shows the probability close to 1, and red shows the probability close to 0. I like this visualization and the color scale. part-commonid-out.con.tre.annotation.pdf

LinguList commented 2 years ago

Yep, what do you think about this visualization? do you like it? That is in any case the one I meant.

Wu-Urbanek commented 2 years ago

It is pretty cool with this type of visualization. I remember R could reach the same effect, but I am not sure about whether ETE has the function to color the edges like the transitioning colormap?

LinguList commented 2 years ago

I guess, no, but this is not that important. I will later check if I can adapt the ete rendering to visualize the consensus trees, so you do not need to bother about this. We later check, what we see as the best fit for the paper, figtree or ete.

Wu-Urbanek commented 2 years ago

I have 8 PDF files from Figtree now. Should I make a PR (does not have to be merged) or I attach those files here in this issue? My observations are written in a Word document. Currently, I am done with the 4 trees with the subset of data. And I will continue writing my observations about the 4 trees that were inferred from the full dataset.

LinguList commented 2 years ago

I'd say you can just push them directly.

LinguList commented 2 years ago

@MacyL, the results are very encouraging. We can say the following:

  1. the commonid and the loosid show a very low resolution power, so we lack phylogenetic signal here, or we have likewise many conflicts (would be interesting to see the consensus networks in dendropy!)
  2. the highest resolution is by the strictid followed by the salient approach.

For larger timedepths, the strictid will loose deep signal (as we count suffixes). So while we recommend strictid for smaller datasets, it is the best to make a careful salient study for larger datasets.

LinguList commented 2 years ago

This is enough for the conclusion. It is also a clear result.