virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

ZeroDivision error #50

Open iprada opened 4 years ago

iprada commented 4 years ago

Dear Viraj,

I am Iñigo Prada-Luengo. The developer behind Circle-Map. I have really enjoyed reading your manuscript and the algorithmic ideas behind AmpliconArchitect. I think it is a great tool, specially for reconstructing the whole amplicon structure.

I am trying to compare the performance of Circle-Map to AmpliconArchitect, however, I have encounter a ZeroDivisionError. I have attached the log and the traceback below.

In case you want to reproduce the error I have run AmpliconArchitect using a fresh miniconda2 installation with the following command:

miniconda2/bin/python2.7 AmpliconArchitect/src/AmpliconArchitect.py --bed aa_seed.bed --bam sorted_10kk.bam --ref None --out test

If you have never encounter this, I will be happy to share the bam file with you as it is a simulated bam file from the hg38 genome. If this is the case we can talk by email. Let me know

Best wishes,

Iñigo

Log and traceback ____ [root:INFO] #TIME 1.061 Loading libraries and reference annotations for: None [root:WARNING] #TIME 0.390 Unable to find reference in $AA_DATA_REPO/REF/file_list.txt. Setting to empty. [root:WARNING] #TIME 0.390 Unable to open fasta file: "/home/iprada/faststorage/projects/a_architect/data_repo/None/". Reference sequences will be set to N. [root:WARNING] #TIME 0.390 Unable to open chromosome lengths file: "/home/iprada/faststorage/projects/a_architect/data_repo/None/" [root:WARNING] #TIME 0.390 interval_list: Unable to open interval file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". [root:WARNING] #TIME 0.390 interval_list: Unable to open interval file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". [root:WARNING] #TIME 0.390 interval_list: Unable to open interval file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". [root:WARNING] #TIME 0.390 interval_list: Unable to open interval file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". [root:WARNING] #TIME 0.390 interval_list: Unable to open interval file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". [root:WARNING] #TIME 0.390 interval_list: Unable to open interval file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". [root:INFO] #TIME 1.736 Initiating bam_to_breakpoint object for: sorted_10kk.bam [root:INFO] #TIME 661.798 Exploring interval: chr10 250748 250749 15 [root:WARNING] #TIME 640.230 rep_content: Unable to open mapability file "/home/iprada/faststorage/projects/a_architect/data_repo/None/". Traceback (most recent call last): File "AmpliconArchitect/src/AmpliconArchitect.py", line 194, in <module> ilist = bamFileb2b.interval_hops(ird) File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 1521, in interval_hops icn = self.interval_neighbors(ic, clist, gcc=gcc) File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 1453, in interval_neighbors msrlist = [self.get_meanshift(i2, ms_window_size0, ms_window_size1, gcc)] File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 695, in get_meanshift msr = self.meanshift_refined(i, window_size0=window_size0, window_size1=window_size1, gcc=gcc) File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 640, in meanshift_refined shifts0 = self.meanshift_segmentation(i, window_size0, gcc, pvalue=0.0027) File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 472, in meanshift_segmentation ms = [w for w in self.meanshift(i, window_size, hb, cov2, rd_global=rd_global, h0=h0, gcc=gcc, n=n)] File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 431, in meanshift dfi = [(cov[wi][0], sum([wj * math.exp(-0.5 * wj ** 2 / hb ** 2) * math.exp(-0.5 * (cov[wi + wj][1] - cov[wi][1])** 2 / hr(wi)**2) for wj in range(-1 * n, n + 1)])) for wi in range(n, len(j) - n) if cov[wi][0] is not None] File "/faststorage/home/iprada/projects/a_architect/AmpliconArchitect/src/bam_to_breakpoint.py", line 430, in hr return math.sqrt(cov[wi][1] / rd_global) * h0 ZeroDivisionError: float division by zero

virajbdeshpande commented 4 years ago

Hello Iñigo,

Thank you for testing AA. I am guessing your sample is a simulated sample with reads simulated only from the region with ecDNA, is that correct? As of now AA is designed to use WGS input files. As a result it raises an error when it estimates the genomic coverage to be 0. There are 2 workarounds for this: 1) (preferred) Use the --cbam, --cbed option where you can simulate reads from 1000 10kbp random regions. Then provide a bed file with the list of the regions and a bam file with simulated reads mapped to the reference. 2) You can add an entry to the file $AA_DATA_REPO/coverage.stats. If you can share the simulation parameters I can provide you the entry to add to the file.

However, you are correct that it will be good to handle cases where the estimated coverage is 0 and raise a warning.