add a csv file with alignment information

roblanf commented 1 year ago

At the moment all the useful information is in nexus format, which can be annoying to work with.

E.g. we have this:

begin SETS;

    [partitions]
    CHARSET COI_1stpos = 1-1592\3;
    CHARSET COI_2ndpos = 2-1592\3;
    CHARSET COI_3rdpos = 3-1592\3;
    CHARSET 16S = 1593-3037;

    [loci]
    CHARPARTITION COI = 1:COI_1stpos, 2:COI_2ndpos, 3:COI_3rdpos;
    CHARPARTITION 16S = 1:16S;

    CHARPARTITION loci = 1:COI, 2:16S;

    [genomes]
    CHARPARTITION   mitochondrial_genome = 1:COI, 2:16S;

    CHARPARTITION genomes = 1:mitochondrial_genome;

But this could be represented as a csv file with the following columns:

alignment_name (e.g. "Anderson_2012")
partition (e.g. "COI_1stpos")
partition_sites (e.g. "1-1592\3")
locus (e.g. "COI")
genome (e.g. "mitochondrial")

We could then use the csv file when entering the data, and build the nexus block directly from the csv file.

roblanf commented 1 year ago

also include a column for 'datatype' e.g. DNA, AA, etc. This comes from the top of the nexus alignment file.

DS4B-ANU commented 11 months ago

include a column for codon position too (NA if it's not a codon position), so now the columns are:

alignment_name (e.g. "Anderson_2012")
partition_name (e.g. "COI_1stpos")
partition_start (e.g. 1)
partition_end (e.g. 100)
partition_skip (e.g. 3; so if start is 1, end is 100, and skip is 3, the nexus format would be 1-100\3)
locus_ name (e.g. "COI")
genome (e.g. "mitochondrial")
data_type (e.g. "DNA", "AA", "RNA")
codon_position (e.g. 1, 2, 3, or NA when it's not protein-coding)

roblanf / BenchmarkAlignments

add a csv file with alignment information #47