kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer #88

Closed kfuku52 closed 2 years ago

kfuku52 commented 2 years ago
--- getfastq 1st-round sequence generation report ---
Target size (--max_bp): 30,000,000,000 bp
Sum of fastq_dump dumped reads: 5,445,101,548 bp
Sum of fastq_dump rejected reads: 9,147,764 bp
Sum of fastq_dump written reads: 5,444,767,944 bp
Sum of fastp input reads: 5,436,287,388 bp
Sum of fastp output reads: 4,602,039,126 bp
Individual SRA IDs: SRR6008321 SRR6008322 SRR6008323 SRR6008324 SRR6008325 SRR6008326 SRR6008327 SRR6008328 SRR6008329
Individual fastq_dump dumped reads (bp): 1,350,581,200 1,229,669,600 2,864,850,748 0 0 0 0 0 0
Individual fastq_dump rejected reads (bp): 4,579,600 4,075,200 492,964 0 0 0 0 0 0
Individual fastq_dump written reads (bp): 1,350,404,200 1,229,514,000 2,864,849,744 0 0 0 0 0 0
Individual fastp input reads (bp): 1,346,178,600 1,225,750,000 2,864,358,788 0 0 0 0 0 0
Individual fastp output reads (bp): 1,276,549,636 1,153,840,212 2,171,649,278 0 0 0 0 0 0

97.19% of reads were obtained in the 1st-round sequence generation.
The amount of generated reads were 2.81% (4,602,039,126/30,000,000,000) smaller than the target size (tol=1%).
Starting the 2nd-round sequence extraction to compensate it.
Traceback (most recent call last):
  File "/opt/conda/envs/biotools/bin/amalgkit", line 378, in <module>
    args.handler(args)
  File "/opt/conda/envs/biotools/bin/amalgkit", line 33, in command_getfastq
    getfastq_main(args)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/amalgkit/getfastq.py", line 821, in getfastq_main
    seq_summary = sequence_extraction_2st_round(args, sra_stat, output_dir, seq_summary, gz_exe, ungz_exe,
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/amalgkit/getfastq.py", line 714, in sequence_extraction_2st_round
    seq_summary = calc_2nd_ranges(seq_summary)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/amalgkit/getfastq.py", line 501, in calc_2nd_ranges
    sra_target_reads = (sra_target_bp / seq_summary['spot_length']).astype(int)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/generic.py", line 5815, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 418, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 591, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1309, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1257, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1168, in astype_nansafe
    return astype_float_to_int_nansafe(arr, dtype, copy)
  File "/opt/conda/envs/biotools/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1213, in astype_float_to_int_nansafe
    raise IntCastingNaNError(
pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer