ksahlin / BESST

BESST - scaffolder for genomic assemblies
Other
57 stars 13 forks source link

division by zero on very long insert libraries #42

Closed rchikhi closed 8 years ago

rchikhi commented 8 years ago

Hi K!

I'm trying to run besst on chr14 data, 35 kbp insert jumping libs. It crashes because it couldn't estimate a "cov" parameter. Yielded that bug:

the error message:


Creating contig graph with library:  assembly.lib_1.bam
Traceback (most recent call last):
  File "/home/rayan/gatb-pipeline/BESST/runBESST", line 415, in <module>
    main(args)
  File "/home/rayan/gatb-pipeline/BESST/runBESST", line 177, in main
    (G, G_prime) = CG.PE(Contigs, Scaffolds, Information, C_dict, param, small_contigs, small_scaffolds, bam_file)      #Create graph, single out too short contigs/scaffolds and store them in F
  File "/home/rayan/gatb-pipeline/BESST/BESST/CreateGraph.py", line 276, in PE
    infer_spurious_link_count_threshold(G_prime, param)
  File "/home/rayan/gatb-pipeline/BESST/BESST/CreateGraph.py", line 325, in infer_spurious_link_count_threshold
    link_params = e_nr_links.Param(param.mean_ins_size, param.std_dev_ins_size, cov, param.read_len, 0)
  File "/home/rayan/gatb-pipeline/BESST/BESST/e_nr_links.py", line 65, in __init__
    self.readfrequency = 2 * self.read_len / self.cov
ZeroDivisionError: float division by zero

Statistics.txt for that pass:


PASS 2

-T 65000.0 -t 55000.0

LIBRARY STATISTICS
Mean of library set to: 35000.0
Standard deviation of library set to:  5000.0
MP library PE contamination:
Contamine rate (rev comp oriented) estimated to:  False
lib contamine mean (avg fragmentation size):  0
lib contamine stddev:  0
Number of contamined reads used for this calculation:  0.0
-T (library insert size threshold) set to:  65000.0
-k set to (Scaffolding with contigs larger than):  55000.0
Number of links required to create an edge:  None
Maximum identical contig-end overlap-length to merge of contigs that are adjacent in a scaffold:  200
Read length set to:  96.0

Time elapsed for getting libmetrics, iteration 1: 0.181855916977

Parsing BAM file...
L50:  68 N50:  362755 Initial contig assembly length:  84440731
Nr of contigs/scaffolds that was singeled out due to length constraints 108
Time cleaning BESST objects for next library:  0.00544214248657
Total time elapsed for initializing Graph:  0.156940937042
Reading bam file and creating scaffold graph...
ELAPSED reading file: 0.558443069458
NR OF FISHY READ LINKS:  0
Number of USEFUL READS (reads mapping to different contigs uniquly):  21272
Number of non unique reads (at least one read non-unique in read pair) that maps to different contigs (filtered out from scaffolding):  1913
Reads with too large insert size from "USEFUL READS" (filtered out):  7266
Initial number of edges in G (the graph with large contigs):  0
Initial number of edges in G_prime (the full graph of all contigs before removal of repats):  15217
Number of duplicated reads indicated and removed:  2577
Mean coverage before filtering out extreme observations =  0.209605479767
Std dev of coverage before filtering out extreme observations=  0.193094966108
Mean coverage after filtering =  0.0
Std coverage after filtering =  0.0
Length of longest contig in calc of coverage:  70661
Length of shortest contig in calc of coverage:  203
Number of edges in G (after repeat removal):  0
Number of edges in G_prime (after repeat removal):  15217
Number of BWA buggy edges removed:  0
Number of edges in G (after filtering for buggy flag stats reporting):  0
Number of edges in G_prime  (after filtering for buggy flag stats reporting):  15217
ksahlin commented 8 years ago

Yep it's a known bug. I'll try to fix this soon. If you want to complete this run, I think that specifying -z 10000 (or any high number), will work. It will deactivate automatic coverage filtering causing the bug.

rchikhi commented 8 years ago

Perfect, completing this run is all I needed :) thanks

rchikhi commented 8 years ago

for the record, -z even with a high value didn't solve it, but pull request #43 did

ptranvan commented 7 years ago

Hi,

did you fix it ? I have downloaded the 2.2.4 version and I got the same error.

ksahlin commented 7 years ago

It's supposed to be fixed in #43, which is version 2.2.5. Let me know if you get the same error with that version.

YoannAnselmetti commented 7 years ago

Hi Kristoffer,

I get exactly the same error with version 2.2.5 than ryan Chikhi But it's very strange since my long insert size library works (~35kb) and my MP (1.5kb) and PE (180bp) libraries don't work.

And I get two kinds of error:

Traceback (most recent call last): File "/home/yanselmetti/.local/bin/runBESST", line 415, in main(args) File "/home/yanselmetti/.local/bin/runBESST", line 177, in main (G, G_prime) = CG.PE(Contigs, Scaffolds, Information, C_dict, param, small_contigs, small_scaffolds, bam_file)

Create graph, single out too short contigs/scaffolds and store them in F

File "/home/yanselmetti/.local/lib/python2.7/site-packages/BESST/CreateGraph.py", line 276, in PE infer_spurious_link_count_threshold(G_prime, param) File "/home/yanselmetti/.local/lib/python2.7/site-packages/BESST/CreateGraph.py", line 325, in infer_spurious_lin k_count_threshold link_params = e_nr_links.Param(param.mean_ins_size, param.std_dev_ins_size, cov, param.read_len, 0) File "/home/yanselmetti/.local/lib/python2.7/site-packages/BESST/e_nr_links.py", line 65, in init self.readfrequency = 2 * self.read_len / self.cov ZeroDivisionError: float division by zero

That is equivalent to Ryan Chikhi error.

And an other kind of error:

Traceback (most recent call last): File "/home/yanselmetti/.local/bin/runBESST", line 415, in main(args) File "/home/yanselmetti/.local/bin/runBESST", line 177, in main (G, G_prime) = CG.PE(Contigs, Scaffolds, Information, C_dict, param, small_contigs, small_scaffolds, bam_file)

Create graph, single out too short contigs/scaffolds and store them in F

File "/home/yanselmetti/.local/lib/python2.7/site-packages/BESST/CreateGraph.py", line 256, in PE Contigs, Scaffolds, G = RepeatDetector(Contigs, Scaffolds, G, param, G_prime, small_contigs, small_scaffolds, I nformation) File "/home/yanselmetti/.local/lib/python2.7/site-packages/BESST/CreateGraph.py", line 986, in RepeatDetector GO.repeat_contigs_logger(Repeats, Contigs, param.output_directory, small_contigs, param) File "/home/yanselmetti/.local/lib/python2.7/site-packages/BESST/GenerateOutput.py", line 63, in repeat_contigs_l ogger print >> repeat_logger_file, "{0}\t{1}\t{2}\t{3}\t{4}\t{5}".format(cont_obj.name, cont_obj.length, round(cont_o bj.coverage,1), round(cont_obj.coverage/param.mean_coverage,0), round(param.mean_ins_size,0), placable) ZeroDivisionError: float division by zero

In my command line, I use the "-z 10000" option. Should I increase this number to avoid these errors?

Thanks you in advance.