Closed clairelavergne closed 1 year ago
Dear @clairelavergne,
Interesting example! One of your sequences has a not-multiple-of-3 indel (which is most likely an alignment artifact) as shown below
The TGT
gets split into TG-
and --T
with intervening gaps. TG-
is interpreted by HyPhy as TGN
, i.e. TG{A,C,G,T}
. One of these: TGA
, is a stop codon, hence you get the warning.
Generally, these "partial" codons are abiological and are entirely due to alignment issues (e.g. nucleotide-level tools). Our (very simple) codon-aware MSA workflow, for example, generates "fused" codons (as you would expect).
I attach the MSA for your reference.
I am on the fence about whether or not the current HyPhy behavior is a feature or a bug. On the one hand, TG-
is not a stop codon (it can be resolved to one, but also to non-stop codons). On the other hand, TG-
probably should not occur in a "proper" codon-aware in-frame alignment, so it serves as a diagnostic of a potential data quality problem. I think I'll update the language and turn it into a WARNING, rather than an error (in the next release).
Best, Sergei
Hi @spond,
Thank you so much for your reply! I hadn't realized how HyPhy interpreted partial codons. I definitely agree that users should be made aware if they have this kind of issue in their data - I thought I'd taken care of all of the alignment artifacts, but clearly not!
Kind regards, Claire
Hi,
HyPhy (both the web server on datamonkey and the GUI version) keeps rejecting my text files by saying that there's a stop codon, but accepts the FASTA file of the alignments. I can't figure out why copy-pasting my alignments from FASTA into the text file would cause this issue. I'm attaching my file drd4combo.txt and the error log I got log.txt. I'm following the guidelines posted here to format my text file. Any insight would be greatly appreciated!