virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

ValueError: cannot convert float NaN to integer #117

Open acticu opened 2 years ago

acticu commented 2 years ago

Hi,thank you for your work. I tried to use prepareAA accompanied with AA to detect ecDNA on mouse but found there calls a error when I entered the command line.

(HRenv) [renhb@localhost ~]$ python3 PrepareAA/PrepareAA.py -s SRR1046336  -t 8 --cnvkit_dir /home/renhb/.conda/envs/HRenv/bin/cnvkit.py --sorted_bam /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam --ref mm10 --run_AA
2022-01-29 22:10:38.503194
Running PrepareAA on sample: SRR1046336

Running CNVKit batch
/home/renhb/.conda/envs/HRenv/bin/python3 /home/renhb/.conda/envs/HRenv/bin/cnvkit.py batch -m wgs -r /data/renhb/data_repo/mm10/mm10_cnvkit_filtered_ref.cnn -p 8 -d /home/renhb/cnvkit_output/ /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam
CNVkit 0.9.9
Wrote /home/renhb/cnvkit_output/mm10_cnvkit_filtered_ref.target-tmp.bed with 1352886 regions
Wrote /home/renhb/cnvkit_output/mm10_cnvkit_filtered_ref.antitarget-tmp.bed with 0 regions
Running 1 samples in 8 processes (that's 8 processes per bam)
Running the CNVkit pipeline on /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam ...
Processing reads in sorted_trim_SRR1046336.fastq.sam.bam
Time: 85.052 seconds (749301 reads/sec, 15907 bins/sec)
Summary: #bins=1352886, #reads=63729810, mean=47.1066, min=0.0, max=830830.7090909091
Percent reads in regions: 74.636 (of 85387020 mapped)
Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.targetcoverage.cnn with 1352886 regions
Skip processing sorted_trim_SRR1046336.fastq.sam.bam with empty regions file /home/renhb/cnvkit_output/mm10_cnvkit_filtered_ref.antitarget-tmp.bed
Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.antitargetcoverage.cnn with 0 regions
Processing target: sorted_trim_SRR1046336.fastq.sam
Keeping 1333844 of 1352886 bins
Correcting for GC bias...
Processing antitarget: sorted_trim_SRR1046336.fastq.sam
Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cnr with 1333844 regions
Segmenting /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cnr ...
Segmenting with method 'cbs', significance threshold 1e-06, in 8 processes
Post-processing /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cns ...
Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cns with 101 regions
Applying filter 'ci'
Filtered by 'ci' from 101 to 52 rows
Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3
Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.call.cns with 52 regions
Significant hits in 339291/1333844 bins (25.4%)
-Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.bintest.cns with 339291 regions

Running CNVKit segment
/home/renhb/.conda/envs/HRenv/bin/python3 /home/renhb/.conda/envs/HRenv/bin/cnvkit.py segment /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cnr  -p 8 -m cbs -o /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cns
Segmenting with method 'cbs', significance threshold 0.0001, in 8 processes
Wrote /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam.cns with 101 regions

Running amplified_intervals
/home/renhb/miniconda3/envs/py27/bin/python2 /home/renhb/AmpliconArchitect/src/amplified_intervals.py --ref mm10 --bed /home/renhb/cnvkit_output/sorted_trim_SRR1046336.fastq.sam_CNV_GAIN.bed --bam /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam --gain 4.5 --cnsize_min 50000 --out     /home/renhb/SRR1046336_AA_CNV_SEEDS
Global ref name is mm10

Running AA with default arguments (& downsample 10). To change settings run AA separately.
/home/renhb/miniconda3/envs/py27/bin/python2 /home/renhb/AmpliconArchitect/src/AmpliconArchitect.py --ref mm10 --downsample 10 --bed /home/renhb/SRR1046336_AA_CNV_SEEDS.bed --bam /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam --runmode FULL --out             /home/renhb//SRR1046336_AA_results/SRR1046336
[root:INFO]     Commandline: /home/renhb/AmpliconArchitect/src/AmpliconArchitect.py  --ref  mm10  --downsample  10  --bed  /home/renhb/SRR1046336_AA_CNV_SEEDS.bed  --bam  /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam  --runmode  FULL  --out  /home/renhb//SRR1046336_AA_results/SRR1046336
[root:INFO]     AmpliconArchitect version 1.2

[root:INFO]     #TIME 5.840     Loading libraries and reference annotations for: mm10
Global ref name is mm10
[root:INFO]     #TIME 50.993    Initiating bam_to_breakpoint object for: /data/renhb/ESC_E14_WGS/sra/fastq/sorted_trim_SRR1046336.fastq.sam.bam
[root:INFO]     #TIME 50.995    Exploring interval: chr8        45006258        45059178
[root:INFO]     #TIME 54.884     Searching new neighbors for interval: chr8     44906258        45159178
[root:INFO]     #TIME 54.955     Calculating coverage meanshift segmentation
[root:INFO]     #TIME 59.492     Detecting breakpoint edges
[root:INFO]     #TIME 59.518     Selecting neighbors
[root:INFO]     #TIME 59.508    Interval sets for amplicons determined:
[root:INFO]     [amplicon1]     chr8:44906258-45159178
[root:INFO]     #TIME 59.508    Reconstructing amplicon1
[root:INFO]     #TIME 59.509     Calculating coverage meanshift segmentation
[root:INFO]     #TIME 59.509     Detecting breakpoint edges
[root:INFO]     #TIME 59.509     Building breakpoint graph
Traceback (most recent call last):
  File "/home/renhb/AmpliconArchitect/src/AmpliconArchitect.py", line 286, in <module>
    bamFileb2b.interval_filter_vertices(ilist, amplicon_name=amplicon_name, runmode=args.runmode)
  File "/home/renhb/AmpliconArchitect/src/bam_to_breakpoint.py", line 1906, in interval_filter_vertices
    koe[ne] = len(self.interval_crossing_arcs(i.chrom, i.start, i.start + self.max_insert, -1, ilist))
  File "/home/renhb/AmpliconArchitect/src/bam_to_breakpoint.py", line 751, in interval_crossing_arcs
    return [a for a in self.fetch(chrom, max(0, start), min(end, hg.chrLen[hg.chrNum(chrom)]))
  File "/home/renhb/AmpliconArchitect/src/bam_to_breakpoint.py", line 147, in fetch
    for a in self.bamfile.fetch(c, s, e + 1):
  File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 658, in pysam.libchtslib.HTSFile.parse_region
ValueError: cannot convert float NaN to integer

Could you please help me troubleshoot that? I already reinstalled the numpy and tried the version 1.19 and 1.19.5 and 1.20 but didn't work. Thank you.

Kind regard, Brian

jluebeck commented 2 years ago

Hi Brian,

Sorry for the delayed response. Thanks for reaching out about this. I am curious if this data you are running AA on is paired-end WGS data? AA currently only supports analysis of paired-end WGS.

If this is indeed paired-end WGS, then I think this may be an issue with the Pysam version. Can you check which Pysam version you have? Is it greater than 0.9.0?

It is also possible it is something related to the BAM file itself. Can I ask what aligner was used to generate the BAM file? Are the reads in the BAM file derived from the entire genome or just a portion of it?

Thanks, Jens

acticu commented 2 years ago

Hi Jens, Sorry for the late response and I checked the data I obtained from SRA and found that they are single-end sequencing data. I think that is why I got NaN because of the lack of Read 2. Appreciate for your help!

Kind regards, Brian