rega-cev / virulign

VIRULIGN: fast codon-correct alignment and annotation of viral genomes
GNU General Public License v2.0
31 stars 12 forks source link

Misalignment of sequences #18

Open TheWhyofFry opened 3 years ago

TheWhyofFry commented 3 years ago

The attached files misalign a bit. If you run virulign using the nef xml on this alignment for HIV and produce an amino acid alignment file, the sequences alternate at codons 23-54 by (screenshot attached):

  1. Starting with a gap extending over 14 codon positions and the remainder with codons 37-40 aligning to the reference residues (in the alignment) finishing with a 14 codon insertion.
  2. Starting with an insertion of 14 codons and hen an alignment over codons 37-40 with a 14 codon gap at the end

These sequences are all from the same host and virus. If I were to realign the AA with mafft, the issue is resolved, but the problem comes in with the position tables, which I like to use downstream of this.

Is this fixable?

Example of the erroneous alignment:

virulign_misalignment

After realigning the blocks:

virulign_fixed

ktheyss commented 3 years ago

Can you also post the command used? I am not at my laptop but

  1. what if you remove all '-' from the alignment prior to aligning
  2. what if you activate the no insertions parameter?
TheWhyofFry commented 3 years ago

The same results are produced if I remove the gaps (I understand virulign removes gaps anyway?).

The command I used was:

virulign 11676_nef.xml 271_nef.fasta --exportReferenceSequence yes --exportKind GlobalAlignment --exportAlphabet AminoAcids --exportWithInsertions yes

If I use --exportWithInsertions no flag the problem does go away, but of course so do the insertions. Since you are familiar with it, I have tried a few of the sequences individually with AGA and it does seem to work (used the --global flag, a modified HXB2 genbank file that extends over the premature stop codon of Nef) and it seems to give a consistent result. (Of course, I concatenated the alignments together - without realignment - since AGA only processes one sequence at a time)

>>nef (hxb2) 
MGGKWSKSSVIGWPTVRERMRRA--------------EPAADRVGAASRDLEKHGAITSS
NTAATNAACAWLEA--QEEEEVGFPVTPQVPLRPMTYKAAVDLSHFLKEKGGLEGLIHSQ
RRQDILDLWIYHTQGYFPD*QNYTPGPGVRYPLTFGWCYKLVPVEPDKIEEANKGENTSL
LHPVSLHGMDDPEREVLEWRFDSRLAFHHVARELHPEYFKNC*
>>271-2002-1/0/shortread/0.658
MGNKWSKC---GWPSVRERMRRTNPAEKSKRERRRQTEQAAEGVGAASRDLDKYGALTSS
NTAATNADCAWLEACEEEEEEVGFPVRPQVPLRPMTYKGAFDLSFFLKEKGGLDGLIHSQ
KRQDILDLWVYHTQGYFPDWHNYTPGPGVRYPLTFGWCFKLVPVDPSEVEEANKGEDNCL
LHPMSQHGMDDGEREVLKWQFDSSLARRHLARELHPEYYKDC*
>>271-2002-1/1/shortread/0.342
MGNKWSKC---GWPSVRERMRRTNPAEKSVRERKRQTEPAAEGVGAASRDLERHGALTSS
NTAATNADCAWVEAHEQEEEEVGFPVRPQVPLRPMTYKGAFDLSFFLKEKGGLDGLIHSQ
KRQDILDLWVYHTQGYFPDWHNYTPGPGVRYPLTFGWCFKLVPVDPREVEEANKGEDNCL
LHPMSLHGMDDGEKEVLKWQFDSSLARRHLARELHPEYYKDC*

Hope this is helpful. Please let me know if you need anything else.