hall-lab / speedseq

A flexible framework for rapid genome analysis and interpretation
MIT License
312 stars 116 forks source link

LumpySV failed on certain cases #66

Open MagpiePKU opened 8 years ago

MagpiePKU commented 8 years ago

We tried to solve this by compiling the newest lumpy-sv version from github and directing it to speedseq but still not working. Some data are processed without any problem but some are dead.

Runtime log: Sourcing executables from /work/zy/bin/speedseq/bin/speedseq.config ...

Checking for required python modules (/usr/local/bin/python2.7)...

Running LUMPY express Sourcing executables from /work/zy/bin/speedseq/bin/speedseq.config ...

Checking for required python modules (/usr/local/bin/python2.7)...

create temporary directory

Calculating insert distributions... Library read groups: 20151112-SHION-01-1057M Library read length: 150 Removed 0 outliers with isize >= 12183219 done 3 Calculating insert distributions... Library read groups: 20151112-SHION-01-1057PD Library read length: 150 /usr/local/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice. warnings.warn("Mean of empty slice.", RuntimeWarning) /usr/local/lib/python2.7/site-packages/numpy/core/_methods.py:71: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "/work/zy/bin/speedseq//bin/pairend_distro.py", line 106, in (removed, upper_cutoff)) TypeError: %d format: a number is required, not numpy.float64

cc2qe commented 8 years ago

We've heard about this issue several times, but never with our own BAM files at WashU. Are you able to send a chunk of your BAM files that recreates this error, perhaps the first million lines?

It looks like the insert size distribution is abnormal, since it's allowing reads up to 12,183,219 bp to be called concordant.