iTaxoTools / TaxI2

Calculation and analysis of pairwise sequence distances
GNU General Public License v3.0
0 stars 0 forks source link

Add support for spart.xml #35

Open mvences opened 2 years ago

mvences commented 2 years ago

TaxI3, I think, already exports the results of the current simple clustering algorithm in spart format.

When defining the spart format in a recent paper by Miralles et al. 2021, we also defined an XML version of this format which is more flexible and can be extended to include much more information than the classical spart.

We should envisage to include spart.xml in two ways in TaxI3:

  1. Besides regular spart output, any clustering performed in the program should also output a spart.xml file
  2. The program should also accept spart.xml as input file.

For the second point, the format however still needs to be developed (I will try to do this along with colleagues in the next weeks). For instance, we may decide that also DNA sequences can be included along with species partitions in spart.xml, in which case the format could include the same information as a tab file (and maybe more). Alternatively, the file may specify a species partition, and sequence names assigned to each subset (= species), and a user can then upload a spart.xml along with a fasta file, so that the same information as currently in a tab file is included.

This is not an issue to be tackled with priority - we first need to define more precisely the extended version of the spart.xml format. Maybe in late January, we can come back to this.