pylogeny / pylogeny

Python Library for Phylogenetic Studies in Linguistics and Beyond
MIT License
0 stars 0 forks source link

code for reading distance matrices and other files #2

Open LinguList opened 3 years ago

LinguList commented 3 years ago

I suppose we use the pylogeny package, which unifies all data, to parse other datasets, e.g., by adding a phylip dst reader, commandline functionality, and adding also readers for other file formats. In this way, this package offers specifically the data parsing and uses our other libraries to conduct the studies, while the little libraries are standalones that can be used also in other projects.

So far, we'll need:

  1. newick reader (simple)
  2. phylip dst format (and beyond), code in lingpy is there
  3. nexus reader (we can use the nexus reader package by @SimonGreenhill as dependency)
  4. some basic wordlist reader, but very simple, to make alignments, once we add them, or to read in cognate set data (can have pycldf as a not so big dependency)
  5. reader for step matrices and parsimony code, for which there's no real format, as far as I know
  6. exchange formats for scenarios (in parsimony)

Some of these formats deserve their own issues.

PhyloStar commented 3 years ago

Some code is available here: https://github.com/pylogeny/bayes/blob/master/src/cybayes/commands/utils.pyx

We need to refactor it.

Phylip reading function: https://github.com/pylogeny/bayes/blob/master/src/cybayes/commands/utils.pyx#L69

Read Nexus: https://github.com/pylogeny/bayes/blob/master/src/cybayes/commands/utils.pyx#L125

PhyloStar commented 3 years ago

For step matrices, a dictionary format should be good. Something like the scorer dictionary?