virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

ZeroDivisionError: float division by zero #138

Open ZHONGHONGXIN opened 1 year ago

ZHONGHONGXIN commented 1 year ago

Hi,I've ran AA with bam file (WGS) with the code below and error message shown as follow: AmpliconArchitect/src/AmpliconArchitect.py --bed seed_50kb.bed --bam pe_q0_redup.bam --sensitivems False --out ./ --ref GRCh38

error: [root:INFO] #TIME 29.130 Exploring interval: chr1 10000 120156419 Traceback (most recent call last): File "AmpliconArchitect/src/AmpliconArchitect.py", line 247, in ilist = bamFileb2b.interval_hops(ird, rdlist=all_ilist) File "AmpliconArchitect/src/bam_to_breakpoint.py", line 1707, in interval_hops ii = self.interval_extend(i2) File "AmpliconArchitect/src/bam_to_breakpoint.py", line 1811, in interval_extend elif self.interval_amplified(hg.interval(ic.chrom, ic.end, ic.end + right_size ms_window_size), filter_small=False): File "AmpliconArchitect/src/bam_to_breakpoint.py", line 1766, in interval_amplified if mc[0] < arm_coverage[0] and w[1] > max(arm_coverage[0] + 3 mc[2] math.sqrt(arm_coverage[0] / mc[0]), arm_coverage[0] + 3.0 mc[0] / 2.0): ZeroDivisionError: float division by zero

Please tell me some advices, thanks.

jluebeck commented 1 year ago

Hi, are you using AmpliconSuite-pipeline to prepare the seeds? Do you have entries in the seeds file that are very small (< 1 kbp)? Is your sequencing data paired-end whole-genome sequencing data that covers all chromosomes?

ZHONGHONGXIN commented 1 year ago

Hi, are you using AmpliconSuite-pipeline to prepare the seeds? Do you have entries in the seeds file that are very small (< 1 kbp)? Is your sequencing data paired-end whole-genome sequencing data that covers all chromosomes?

l use bwa-mem and CNVKIT to prepare the seeds, code as follow: BWA: bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem -t 10 bwa-mem2_index/hg38/hg38.fa out_1_val_1.fq.gz out_2_val2.fq.gz |grep -E -v "chrM|random|chrUn" | samtools view -@ 10 -m 3G -b -o out_pe.bam samtools sort -@ 5 out_pe.bam -o out_pe_sort.bam sambamba markdup -r -t 5 out_pesort.bam out pe_q0redup.bam CNVKIT: cnvkit.py batch out pe_q0_redup.bam --annotate hg38_noENSG.bed --normal --method wgs -f hg38.fa -p 4 --output-reference ./ --output-dir out awk -F '\t' '{if($6>4.5 && $3-$2>50000) print $1 "\t" $2 "\t" $3 "\t" $6 }' out_pe_q0_redup.call.cns |grep -v "end" |grep -v "chrY" > out_seed_50kb.bed AA: AmpliconArchitect.py --bed out_seed_50kb.bed --bam out_pe_q0_redup.bam --sensitivems False --out ./ --ref GRCh38

and my data is data paired-end whole-genome sequencing data. Part of out_seed_50kb.bed as follow: chr21 5246642 5393558 57 chr21 8756715 8996579 1039 chr21 9086357 9557037 5648 chr21 10614464 10814560 74 chr21 13845856 13960862 98 chr22 11251438 11843920 40 chr22 11843920 13009029 1202 chr22 18789561 18849559 703 chr22 21224478 21284476 543

ZHONGHONGXIN commented 1 year ago

Hi, are you using AmpliconSuite-pipeline to prepare the seeds? Do you have entries in the seeds file that are very small (< 1 kbp)? Is your sequencing data paired-end whole-genome sequencing data that covers all chromosomes?

oh, my notice there are > 50 CNV seeds going into AA, maybe couse something wrong.

jluebeck commented 1 year ago

Yes, there are additional steps performed by AmpliconSuite-pipeline to filter the seeds. Please use AmpliconSuite-pipeline to prepare the seed regions for AA.

Thanks, Jens

ZHONGHONGXIN commented 1 year ago

Ok, thank you.