Check if characters are in alphabet when reading fasta and filter them out if not.
This has a problem - the sequences come out of different length and we can no longer deduce alignment length. I think this is wrong, because we are supposed to see aligned sequences.
Currently fails:
+/workdir/.build/docker/release/treetime ancestral --method-anc=parsimony --tree=data/lassa/L/50/tree.nwk --outdir=tmp/smoke-tests/ancestral/parsimony/lassa/L/50 data/lassa/L/50/aln.fasta.xz
Error:
0: When calculating length of sequences
1: Sequences are expected to all have the same length, but found the following lengths:
Length 845:
"MK107855"
Length 871:
"MK107845"
Length 873:
"MH887995"
Now only ebola fails (Makona-UK3 contains nuc U, all others - don't)
+/workdir/.build/docker/release/treetime ancestral --method-anc=marginal --dense=true --model=jc69 --tree=data/ebola/tree.nwk --outdir=tmp/smoke-tests/ancestral/marginal/ebola data/ebola/aln.fasta.xz
Error:
0: When calculating length of sequences
1: Sequences are expected to all have the same length, but found the following lengths:
Length 13915:
"Makona-UK3"
Length 19006:
"EM_COY_2015_015982"
"G3676"
"EM_COY_2015_015980"
"G3670"
"CON-10590"
"NM042"
"EM_079497"
<remaining sequence names here>
Can do char replacement instead (gap? unknown? still depends on alphabet)
Check if characters are in alphabet when reading fasta and filter them out if not.
This has a problem - the sequences come out of different length and we can no longer deduce alignment length. I think this is wrong, because we are supposed to see aligned sequences.
Currently fails:
Now only ebola fails (
Makona-UK3
contains nucU
, all others - don't)Can do char replacement instead (gap? unknown? still depends on alphabet)
A fallible alternative is here: