Closed Ge0rges closed 8 months ago
You're the best, @Ge0rges! Thank you very much for catching the bug and fixing it :)
@meren what do you think about the ambiguous character question?
Ah, sorry, I completely misread that sentence. I am not sure what to suggest for that. Under what circumstances it becomes a necessity? When we convert single-letter alphabets to verbose names?
The case I ran into is that I have protein sequences with these characters (so they are still valid but not seen as so), that I give to anvi-run-ncbi-cogs
which it also sees as invalid due to the use (I think) of utils.utils.is_gene_sequence_clean
(and/or another part of the script). But I think these characters should be deemed valid inputs for anvio programs.
As discussed on discord, the script was not using the correct AA alphabet. That is fixed here along with another bug regarding the existence of flags causing the script to skip over some reads. Before this is merged we should also consider whether the addition of ambiguous characters to the AA alphabet is warranted (e.g.
BZJ
).