Closed pranjalv123 closed 9 years ago
What is the use case for this? What systems generate files like this? Is this documented anywhere?
I have some simulated datasets I'm trying to analyze that are like this - from http://www.cs.utexas.edu/~phylo/datasets/astral2/
I'm working on a patch that resolves this, I think I should have it ready relatively soon.
At this point, I would rather not add support for this. The phylogenetic dataspace is already polluted with too many idiosyncratic, poorly/inconsistently/incorrectly/non-documented data formats as well as many standards-violating variants of existing data formats for us to introduce yet another one, which we will have to maintain in perpetuity.
Ideally, the upstream programs should be fixed to generate standards-compliant files, as your patch does. If they cannot, then a pre-processing step where the file is split based on the appropriate regular expression would be the solution.
It would be useful if the PHYLIP reader could read a bunch of character matrices from a single PHYLIP file. For example a PHYLIP file might look like
200 345 ...200 lines with 345 characters each... 200 221 ...200 lines with 221 characters each... etc.