virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

ValueError: cannot convert float NaN to integer #58

Open YaCui opened 4 years ago

YaCui commented 4 years ago

Hi Viraj,

Thanks for your work in AA. Do you know what happened about this error? Thanks for your help.

$]cat test_out.bed 7 54700000 54950000 30.0 /dfs3/weil21-lab/yac7/data/test.bed

--------------

$ python ./src/AmpliconArchitect.py --ref "GRCh37" --bam /dfs3/weil21-lab/yac7/data/test.bam --bed /dfs3/weil21-lab/yac7/data/test_out.bed --out test [root:INFO] #TIME 0.674 Loading libraries and reference annotations for: GRCh37 [root:WARNING] #TIME 1.820 interval_list: Unable to open interval file "/data/users/yumeil1/tools/AmpliconArchitect/data_repo/GRCh37/". [root:INFO] #TIME 2.064 Initiating bam_to_breakpoint object for: /dfs3/weil21-lab/yac7/data/test.bam [root:INFO] #TIME 2.068 Exploring interval: 7 54700000 54950000 30.0 /dfs3/weil21-lab/yac7/data/test.bed

Traceback (most recent call last): File "./src/AmpliconArchitect.py", line 194, in ilist = bamFileb2b.interval_hops(ird) File "/data/users/yumeil1/tools/AmpliconArchitect/src/bam_to_breakpoint.py", line 1504, in interval_hops ii = self.interval_extend(i2) File "/data/users/yumeil1/tools/AmpliconArchitect/src/bam_to_breakpoint.py", line 1632, in interval_extend elif self.interval_amplified(hg.interval(ic.chrom, ic.start - left_size ms_window_size, ic.start), filter_small=False): File "/data/users/yumeil1/tools/AmpliconArchitect/src/bam_to_breakpoint.py", line 1576, in interval_amplified elif filter_small == False and i.size() < 2 ms_window_size and len(self.interval_discordant_edges(i)) >= 2: File "/data/users/yumeil1/tools/AmpliconArchitect/src/bam_to_breakpoint.py", line 1044, in interval_discordant_edges mcdflist.extend(hgl.merge_clusters(extend=self.max_insert - self.read_length)) File "/data/users/yumeil1/tools/AmpliconArchitect/src/hg19util.py", line 447, in merge_clusters ci = ci.merge(a, extend) File "/data/users/yumeil1/tools/AmpliconArchitect/src/hg19util.py", line 282, in merge if not self.intersects(y, extend): File "/data/users/yumeil1/tools/AmpliconArchitect/src/hg19util.py", line 266, in intersects if (int(a[1])-int(b[1]))*(int(a[2])-int(b[1])) <= 0: ValueError: cannot convert float NaN to integer

jchenpku commented 4 years ago

I got similar error. Have your resolved the problem?

virajbdeshpande commented 4 years ago

Apologies for the slow response. I am wondering if this might be an issue due differences in how pysam reads BAM files. Can you try the following branch and let me know if you still get the error: https://github.com/virajbdeshpande/AmpliconArchitect/tree/runtime_discordant_clustering

jchenpku commented 4 years ago

I tried. still the same problem. cat 22Rv1.bed chr10 46952851 47151600 4.67057444071 readDepth/output/alts.dat

[root:INFO] #TIME 6.252 Loading libraries and reference annotations for: hg19 [root:INFO] #TIME 9.820 Initiating bam_to_breakpoint object for: 22Rv1.deduped.bam [root:INFO] #TIME 9.822 Exploring interval: chr10 46952851 47151600 Traceback (most recent call last): File "../AmpliconArchitect/src/AmpliconArchitect.py", line 194, in ilist = bamFileb2b.interval_hops(ird, rdlist=all_ilist) File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/bam_to_breakpoint.py", line 1549, in interval_hops ii = self.interval_extend(i2) File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/bam_to_breakpoint.py", line 1653, in interval_extend elif self.interval_amplified(hg.interval(ic.chrom, ic.end, ic.end + right_size ms_window_size), filter_small=False): File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/bam_to_breakpoint.py", line 1620, in interval_amplified elif filter_small == False and i.size() < 2 ms_window_size and len(self.interval_discordant_edges(i)) >= 2: File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/bam_to_breakpoint.py", line 1121, in interval_discordant_edges mcdflist.extend(hgl.merge_clusters(extend=self.max_insert - self.read_length)) File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/hg19util.py", line 449, in merge_clusters ci = ci.merge(a, extend) File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/hg19util.py", line 284, in merge if not self.intersects(y, extend): File "/media/ssdtmp/ecDNA/AmpliconArchitect/src/hg19util.py", line 268, in intersects if (int(a[1])-int(b[1]))*(int(a[2])-int(b[1])) <= 0: ValueError: cannot convert float NaN to integer

virajbdeshpande commented 4 years ago

Thanks, would it be possible to share the BAM file with me? You can email me on github_username@gmail.com

zhang919 commented 4 years ago

Dear @virajbdeshpande @jchenpku @YaCui , I may have the same problem recently. If you have solved this problem, could you please give me some suggestions? Best. Zhang.

sxv commented 4 years ago

Same problem here. I may have a further hint, with some warnings and 'nan' values preceding the eventual error:

/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py:356: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:140: RuntimeWarning: Degrees of freedom <= 0 for slice
  keepdims=keepdims)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:110: RuntimeWarning: invalid value encountered in true_divide
  arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:132: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
read length 148.52587479399048 nan nan nan 0.0
coverage stats (2.9556649084004105, 2.9863662028398092, 0.8700460119681304, 2.9705174958798093, 3.450005137338035, 2.197522010279815, 148.52587479399048, nan, nan, 0, nan, nan, 0.0) 730
pair support nan
0 1001 70 0.0