raphael-group / multibreak-sv

MultiBreak-SV identifies structural variants from next-generation paired end data, third-generation long read data, or data from a combination of sequencing platforms.
12 stars 6 forks source link

Error in Writing Assignment File phase #8

Closed MeHelmy closed 7 years ago

MeHelmy commented 8 years ago

I have this error while running M5toMBSV.py script any clue?

SPLITTING ESP FILE maxgap is 87291 determining Lmin/Lmax for every discordant pair. 0 lines written to translocations file tolerant-RunGASV/binned-esps//translocations wrote to file tolerant-RunGASV/gasv.in Arguments: tolerant_tolerant-FormatAlignments/esps-outer-coords.txt.sorted_tolerant-FormatAlignments/esps.full_tolerant-FormatAlignments/adjustedalignments.txt_tolerant-FormatAlignments/espmapping-with-hmmdels.txt_tolerant-FormatAlignments/hmmdel-lengths.txt_pacbio

Writing Assignment File... getting ESPs... 33164 ESPs from 0 longreads longread_974419 ... Traceback (most recent call last): File "/home/medhat/source/multibreak-sv/bin/M5toMBSV.py", line 870, in main(sys.argv) File "/home/medhat/source/multibreak-sv/bin/M5toMBSV.py", line 100, in main assignmentfile = makeAssignmentFile(prefix,sortedfile,fullespfile,adjustedalignmentfile,mapfile,hmmdellenfile,experiment) File "/home/medhat/source/multibreak-sv/bin/M5toMBSV.py", line 634, in makeAssignmentFile print [a for a in multireads][0],'...' IndexError: list index out of range

annaritz commented 8 years ago

Hi @MeHelmy, it seems that there may be a naming issue with the long reads (the line "33164 ESPs from 0 longreads" looks suspicious, since ESPs are generated from long reads). Can you provide more information about your input files and the output files that are generated?

MeHelmy commented 8 years ago

The alignment command

blasr ./pacbio.fa ./genome.fa --nproc 28 --bestn 1 --noSplitSubreads --clipping soft -m 5 --nCandidates 15 --sdpTupleSize 6 --out ./result/pacbio_align_blasr.m5

command for SV

python M5toMBSV.py --prefix tolerant ~/source/gasv/ ~/source/multibreak-sv/lib /data/correct_pacbio_2016-03-31/analysis/alignment/245_blasr/245_maize_pacbio_align_blasr.m5 > out.log 2>&1

Input file header:

head -n1 245_maize_pacbio_align_blasr.m5

m141114_221139_42149_c100737122550000001823154305141593_s1_p0/122526/0347 320 0 320 + 5 217959525 52294069 52294390 - -1529 314 6 0 1 25 TACCTCCGATAACTTCTTCCCTGCCTTTTCTTTCCTAGATCCATAGCCTTCCTGTTTGTTAGCA-GCTTCATCTGAATCTTCATCTCTGTTTTCCTCACTGTCTGACTTCATGGATTCATCCTCATCAATATCATACTGGAGCGCTTTATTCTGCCTTTTACTAGAAGTATCGTCATCGACAAATTTATGTAGCAAGAACAGGCAATGAGAATGTAGGTAATACAATGATAAAAAGAAGGATGGCCCAACTGACCATGATAGTGTTAAAATAATCATTGAAAATACACAACAGATATAAGCAAGAAAAGCTTCATTTGAGC ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||*|||||||||||||||||||||||||||||||||||_||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TACCTCCGATAACTTCTTCCCTGCCTTTTCTTTCCCAGATCCATAGCGTTCCTGTTTGTTAGCATACTTCATCTGAATCTTCTTCTATGTTTTCCTCACTGTCAGACTTCATGGATTCATCCTCATCAATATCATACTGGAGCGCTTTATTCTGCCTTTTACTAGAAGTATCGTCATCGACAAATTTATGTAGCAAGAACAGGCAATGAGAATGTAGGTAATACAATGATAAAAAGAAGGATGGCCCAACTGACCATGATAGTGTTAAAATAATCATTGAAAATACACAACAGATATAAGCAAGAAAAGCTTCATTTGAGC

Output files;

three directory

tolerant-FormatAlignments tolerant-MBSVinputs tolerant-RunGASV

tolerant-MBSVinputs cluster-subproblems and it is empty

tolerant-RunGASV contains binned-esps gasv.in and binned-esps contains 33165 files for example intrachrom-longread_942027

and tolerant-FormatAlignments contains those files : adjustedalignments.txt

annaritz commented 8 years ago

Ok, thanks - this all looks reasonable. You use some different blasr arguments, but the .m5 file looks fine. Two questions:

(1) You can run the example provided with the code with no problems, correct?

(2) Can you make a smaller .m5 file that has the same problem and attach it? For example, try taking all alignments for one or two reads in the .m5 file (~10-20 alignments). I will be able to debug it.

MeHelmy commented 8 years ago

@annaritz 1) yes the example run Ok. 2) you did not specify in the documentation optimal parameters to run blasr, can you please give example or suggestion? 3) Kindly find the attached example test.txt

annaritz commented 8 years ago

I ran BLASR with default parameters; it is unlikely that the m4 format has changed, but I know that BLASR development is pretty active so they may have changed something I haven't noticed yet. Your BLASR parameters look reasonable, I haven't tested MultiBreak-SV for those parameters.

I will use your test.txt example to debug the problem - thanks.

annaritz commented 8 years ago

@MeHelmy - the test.txt file dies with a different error than the one posed in this GitHub Issue. I ran the following

python M5toMBSV.py <path-to-gasv-dir> <path-to-lib-dir> test.txt

I committed a change so It now prints an error message if there are zero ESPs to cluster:

!!!!!!!!!!!
There are zero multi-breakpoint mappings; thus no SVs can be clustered. Exiting.
!!!!!!!!!!!

Please update your repo and re-run. Once you confirm that you are still seeing the original error on your full .m5 file, make another test.txt that dies with the same error. Thanks.

MeHelmy commented 8 years ago

@annaritz rerun have the same error

SPLITTING ESP FILE maxgap is 87291 determining Lmin/Lmax for every discordant pair. 0 lines written to translocations file 245-RunGASV/binned-esps//translocations wrote to file 245-RunGASV/gasv.in Arguments: 245_245-FormatAlignments/esps-outer-coords.txt.sorted_245-FormatAlignments/esps.full_245-FormatAlignments/adjustedalignments.txt_245-FormatAlignments/espmapping-with-hmmdels.txt_245-FormatAlignments/hmmdel-lengths.txt_pacbio

Writing Assignment File... getting ESPs... 33164 ESPs from 0 longreads longread_974419 ... Traceback (most recent call last): File "/home/medhat/source/multibreak-sv/bin/M5toMBSV.py", line 870, in main(sys.argv) File "/home/medhat/source/multibreak-sv/bin/M5toMBSV.py", line 100, in main assignmentfile = makeAssignmentFile(prefix,sortedfile,fullespfile,adjustedalignmentfile,mapfile,hmmdellenfile,experiment) File "/home/medhat/source/multibreak-sv/bin/M5toMBSV.py", line 634, in makeAssignmentFile print [a for a in multireads][0],'...' IndexError: list index out of range

This file produces the error test.txt

From debugging

method makeAssignmentFile line 620 the matchObj always rerun empty so read1 and read2 will never be assigned so when we will call print [a for a in multireads][0],'...' it will rise this error

so we can check before the printing and give some error (of course this do not declare why the matchObj is always empty I do not know the logic of the script)

gn01786955 commented 7 years ago

The way of alignment is my reference genome to raw read who made m5 file I found the raw read that is must name with chromosome You can try it ex: m000000_000000_00000_cSIMULATED_s0_p0/0/0_1561 1561 1042 1143 + tig00000001 78774742 42483345 42483437 + -207 76 8 17 8 0 the message error is WARNING: tig00000001 is not a recognized chromosome. Ignoring this ESP. ex: m000000_000000_00000_cSIMULATED_s0_p0/0/0_1561 1561 1042 1143 + chr17 78774742 42483345 42483437 + -207 76 8 17 8 0 success!!!

annaritz commented 7 years ago

@gn01786955 glad you fixed your issue! Yes, it relies on numbered chromosomes (e.g., either 17 or chr17). Relabeling chromosomes should solve your problem - thanks for the update.

However, this thread is raising a different issue that is specific to another user's data. I am happy to reopen this issue if it is still a problem.