xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

Add simple PHYLIP parser #67

Closed xflouris closed 8 years ago

xflouris commented 8 years ago

No need for interleaved format at the moment.

xflouris commented 8 years ago

Done. Reduced to only normal PHYLIP for simplicity. Interleaved and multi-loci formats are available in the BPP repo, can add it if someone urgently needs it.

stamatak commented 8 years ago

what is normal phylip and multi-loci format according to you?

we should support the relaxed phylip as used in raxml and phyml ..

alexis

On 19.03.2016 00:02, Tomas Flouri wrote:

Done. Reduced to only normal PHYLIP for simplicity. Interleaved and multi-loci formats are available in the BPP repo, can add it if someone urgently needs it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/xflouris/libpll/issues/67#issuecomment-198572686

Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson

www.exelixis-lab.org

stamatak commented 8 years ago

hi tomas, could you please answer my question above, thanks :-)

alexis

xflouris commented 8 years ago

Hi Alexi, with normal I meant the format where each sequence is on exactly one line following the sequence name (I guess this is also called sequential format). With multi-loci format I meant multiple phylip files concatenated together in one file.

Interleaved phylip (sequences are on multiple lines and continue after all taxa are defined), if this is what you mean with relaxed, is not supported although it is only a couple of additional rules in the bison grammar file -- can add it if you like.

stamatak commented 8 years ago

yes please add the relaxed format, which allows taxon names of arbitrary length, followed by one or more whitespaces, followed by the sequences in either sequential or interleaved format, it's the de facto standard now,

alexis

On 01.04.2016 11:26, Tomas Flouri wrote:

Hi Alexi, with normal I meant the format where each sequence is on exactly one line following the sequence name (I guess this is also called sequential format). With multi-loci format I meant multiple phylip files concatenated together in one file.

Interleaved phylip (sequences are on multiple lines and continue after all taxa are defined), if this is what you mean with relaxed, is not supported although it is only a couple of additional rules in the bison grammar file -- can add it if you like.

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/xflouris/libpll/issues/67#issuecomment-204326852

Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson

www.exelixis-lab.org

lczech commented 8 years ago

also don't forget that in sequential mode, the sequences can continue on multiple lines: @xflouris, you said "each sequence is on exactly one line following the sequence name", but in both, sequential and interleaved mode, the sequences themselves can be on multiple lines. the difference between the two variants is the order of those lines: sequential is like fasta (label, sequence, label, sequence,...), while in interleaved, it is, well, interleaved (label, bit of sequence, label, bit of sequence, ..., bit more of first sequence ...).

see http://evolution.genetics.washington.edu/phylip/doc/sequence.html also, if you want c++, you can use my reader: https://github.com/lczech/genesis/blob/master/lib/sequence/io/phylip_reader.cpp ;-)

ziheng-yang commented 8 years ago

"followed by one or more whitespaces" can this be changed into "followed by two or more whitespaces" since a sequence name may include (single) spaces in it? this is the rule i use. ziheng

At 02:29 01/04/2016 -0700, Alexis Stamatakis wrote:

yes please add the relaxed format, which allows taxon names of arbitrary length, followed by one or more whitespaces, followed by the sequences in either sequential or interleaved format, it's the de facto standard now,

alexis

On 01.04.2016 11:26, Tomas Flouri wrote:

Hi Alexi, with normal I meant the format where each sequence is on exactly one line following the sequence name (I guess this is also called sequential format). With multi-loci format I meant multiple phylip files concatenated together in one file.

Interleaved phylip (sequences are on multiple lines and continue after all taxa are defined), if this is what you mean with relaxed, is not supported although it is only a couple of additional rules in the bison grammar file -- can add it if you like.

� You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/xflouris/libpll/issues/67#issuecomment-204326852

Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson

www.exelixis-lab.org

��� You are receiving this because you are subscribed to this thread. Reply to this email directly or https://github.com/xflouris/libpll/issues/67#issuecomment-204327489view it on GitHub