rega-cev / virulign

VIRULIGN: fast codon-correct alignment and annotation of viral genomes
GNU General Public License v2.0
31 stars 12 forks source link

nt-debug (failed pairwise alignment) #7

Open thierryjanssens opened 6 years ago

thierryjanssens commented 6 years ago

Hello,

it seems that some sequences that are successfully aligned to the reference end up in the nt-debug folder. While some that failed to do so are not present in the nt-debug folder.

Kind regards,

Thierry

thierryjanssens commented 6 years ago

Hi,

I think I can see a pattern. The headers in the fasta file I use consist of TAXON(with underscores in between)_SAMPLENAME_ORFNAME(with underscores in between). The Failed folder (i.e. nt-debug output) contains the paired alignment for the unique taxon - sample conbination.

e.g.:

Rhinovirus_A_sample_1_NODE_1-1 Rhinovirus_B_sample_1_NODE_1-1

but

Rhinovirus_B_sample_1_NODE_1-2 does not show up.

kind regards,

Thierry

ktheyss commented 6 years ago

In order to replicate this problem, a test fasta file is created, containing Dengue Serotype 1 sequences, Dengue Serotype 3 and HIV-1.

Headers are

virus_A_sample_1_NODE_1-1 virus_A_sample_1_NODE_1-2 virus_A_sample_2_NODE_1-1 virus_A_sample_3_NODE_1-1 virus_B_sample_1_NODE_1-1 virus_B_sample_1_NODE_1-2 virus_B_sample_1_NODE_1-3 denv1_A_sample_1_NODE_1-1 denv1_B_sample_1_NODE_1-1 denv1_B_sample_1_NODE_1-2 denv1_B_sample_2_NODE_2-2 denv1_B_sample_2_NODE_2-2 denv1_B_sample_1_NODE_2-3 denv1_B_sample_2_NODE_1-2

The command used was the following : virulign DENV1-NC001477.xml test.txt --nt-debug debugfolder/

Two remarks to be considered: