ksahlin / BESST_RNA

Scaffolding of genomic assemblies with RNA seq data
15 stars 1 forks source link

ZeroDivisionError: float division by zero while running script #1

Closed aseetharam closed 9 years ago

aseetharam commented 9 years ago

I was trying to test this script and ran it on my dataset (as given in your readme), by I got this error:

python Main.py 1 -c genome.scf.fasta  -f Trinity_mapped_sorted.bam -o improved_scf.fa -e 3 -T 20000 -k 500 -d 1 -z 1000
Main.py:224: UserWarning: parameter -g (treating haplotypic regions) inactivated, parameters -a and -b will not have any effect if specified.
  warnings.warn('parameter -g (treating haplotypic regions) inactivated, parameters -a and -b will not have any effect if specified. ')
Starting scaffolding with library:  Trinity_mapped_sorted.bam
Parsing BAM file...
Computing parameters not set by user...

Mean of library set to: No mean calc since RNA reads
Standard deviation of library set to: No std calc since RNA reads
-T (library insert size threshold) set to:  20000
-k set to (Scaffolding with contigs larger than):  500
Number of links required to create an edge:  3
Read length set to:  1109.80555556
Relative weight of dominating link set to (default=3):  3

LG50:  16433 NG50:  32716 Initial contig assembly length:  1563792425
Nr of contigs/scaffolds included in scaffolding: 207915
Total time elapsed:  4.58398103714
USEFUL READS (reads mapping to different contigs):  0
Reads with too large insert size from "USEFUL READS" (filtered out):  0
Number of duplicated reads indicated and removed:  0
Mean coverage before filtering out extreme observations =  0.0
Std dev of coverage before filtering out extreme observations=  0.0
Quantile for repeat detector chosen to: 3.88445706788
Traceback (most recent call last):
  File "Main.py", line 247, in <module>
    options.mapquality)
  File "Main.py", line 79, in Main
    (G, Contigs, Scaffolds, F, param) = CG.PE(Contigs, Scaffolds, F, Information, output_dest, C_dict, param)      #Create graph, single out too short contigs/scaffolds and store them in F
  File "/home/arnstrm/BESST_RNA/src/CreateGraph.py", line 234, in PE
    mean_cov, std_dev_cov = CalculateMeanCoverage(Contigs, param.first_lib, output_dest, param.bamfile)
  File "/home/arnstrm/BESST_RNA/src/CreateGraph.py", line 352, in CalculateMeanCoverage
    mean_cov = sum(filtered_list) / n
ZeroDivisionError: float division by zero

Do you know if I did something wrong? FYI, I used trinity de novo assembled transcripts and mapped to my assembled genome using GMAP (default parameters).

I greatly appreciate any help. Thanks!

aseetharam commented 9 years ago

Ah, I see "reads mapping to different contigs' is 0! Sorry, my bad. Please close the issue!

ksahlin commented 9 years ago

Hi,

BESST uses information between genomic read pairs to scaffold contigs. BESST_rna is a modified version of BESST where RNA-seq can be used to scaffold (we don't know the genomic distance between RNA read pairs, it depends on the intron size). However, in both cases the softwares need to know what reads/segments belong together in order to scaffold. BESST and BESST_rna extracts this information from the bam file, so the segments needs to be aligned in a way such that pairs are recognised. Hope this helps your issue!