gwcbi / haphpipe

NGS viral assembly and population genetics
GNU General Public License v3.0
9 stars 7 forks source link

Checking that sequence is multiple of 3 for pairwise_align. #14

Open kmgibson opened 4 years ago

kmgibson commented 4 years ago
hp_pairwise_align --amplicons_fa haphpipe_assemble_02/final.fna --ref_fa ../HIV_B.K03455.HXB2.fasta --ref_gtf ../HIV_B.K03455.HXB2.gtf --outdir haphpipe_assemble_02

[--- pairwise_align ---] Using temporary directory /local/tmpHP_pairwise_alignlvd6l4cg
/GWSPH/home/kmgibson/.conda/envs/haphpipe_10072019/lib/python3.7/site-packages/Bio/Seq.py:2715: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  BiopythonWarning)
/GWSPH/home/kmgibson/.conda/envs/haphpipe_10072019/lib/python3.7/site-packages/Bio/Seq.py:2715: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  BiopythonWarning)
Alignment validation passed
Alignment validation passed
Alignment validation passed
sid|PL12|ref|HIV_B.K03455.HXB2| alignment validation passed
sid|PL12|ref|HIV_B.K03455.HXB2| haphpipe    amplicon    2252    3869    .   +   .   name "PRRT"; call_len "1618"; call_reg "2252-3869"; match "1054"; mismatch "564"; qgap "0"; rgap "0"; uncalled "0";
sid|PL12|ref|HIV_B.K03455.HXB2| haphpipe    amplicon    4230    5093    .   +   .   name "INT"; call_len "864"; call_reg "4230-5093"; match "836"; mismatch "28"; qgap "0"; rgap "0"; uncalled "0";
sid|PL12|ref|HIV_B.K03455.HXB2| haphpipe    amplicon    5838    7791    .   +   .   name "gp120"; call_len "1954"; call_reg "5838-7791"; match "1722"; mismatch "192"; qgap "15"; rgap "40"; uncalled "0";

I need to double check the GTF form and make sure it is in multiple of threes, and take this possibility into account into the code.

kmgibson commented 4 years ago

Also add more to the error message - pertaining to this error:

(haphpipe) [uzma_rentia@hth001 Uzma_hp]$ haphpipe pairwise_align --amplicons_fa final.fna --ref_fa HIV_B.K03455.HXB2.fasta --ref_gtf HIV_B.K03455.HXB2_2.gtf

[--- pairwise_align ---] Using temporary directory /tmp/tmpHP_pairwise_alignr6red5e0
Traceback (most recent call last):
  File "/GWSPH/home/uzma_rentia/.conda/envs/haphpipe/lib/python3.7/site-packages/haphpipe/stages/pairwise_align.py", line 117, in pairwise_align
    gl = ampdict[(aid['ref'], aid['reg'])]
KeyError: 'reg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/GWSPH/home/uzma_rentia/.conda/envs/haphpipe/bin/haphpipe", line 8, in <module>
    sys.exit(console())
  File "/GWSPH/home/uzma_rentia/.conda/envs/haphpipe/lib/python3.7/site-packages/haphpipe/haphpipe.py", line 174, in console
    args.func(**sysutils.args_params(args))
  File "/GWSPH/home/uzma_rentia/.conda/envs/haphpipe/lib/python3.7/site-packages/haphpipe/stages/pairwise_align.py", line 119, in pairwise_align
    poss_gl = [t for t in ampdict.keys() if t[1] == aid['reg']]
  File "/GWSPH/home/uzma_rentia/.conda/envs/haphpipe/lib/python3.7/site-packages/haphpipe/stages/pairwise_align.py", line 119, in <listcomp>
    poss_gl = [t for t in ampdict.keys() if t[1] == aid['reg']]
KeyError: 'reg'

This error was because the output file used as the amplicon fasta did not contain amplicons - just a single large sequence. Need to also address this in the manual.