jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

genrich fails with floating point error (136) on too much input #72

Closed malcook closed 3 years ago

malcook commented 3 years ago

I am getting floating point error while using genrich on

314,153,343 TOTAL READS

coming from seven name sorted bams of replicate atac-seq data with

read count 21,001,420 31,212,472 56,249,110 43,494,211 53,324,666 42,859,376 66,012,088

any advice or request for data to reproduce?

my workaround is to drop some replicates.

jsh58 commented 3 years ago

At what stage does the error occur? You should get some indication of this from the verbose output.

If there is a subset of the data on which I can reproduce the data, please provide it and I will have a look.

malcook commented 3 years ago

running

Genrich --version Genrich, version 0.6 Copyright (C) 2018 John M. Gaspar (jsh58@wildcats.unh.edu)

run1 quit with floating point error output ends right after it shows it has begun processing the 2nd ("#1") file, as below. run2 I swapped the order of the 1st two input files. Surprise! It completed: "Peaks identified: 18324 (6097652bp)" run3 I swapped them back (superstition) and same error as run1. run4 I ran with just 1st 2 input file and same error same error as run1

Genrich --verbose -S -j -f AC_LL_H_000.narrowPeak.pq.bedgraph -r -e chrM -g 300 -a 5.0 -y -t 'AC_LL_H_000.10.n.bam AC_LL_H_000.3.n.bam' -q .05 -o AC_LL_H_000.narrowPeak &> AC_LL_H_000.narrowPeak.stderrout

run1 log file: https://research.stowers.org/mec/genrich72/AC_LL_H_000.narrowPeak.stderrout

2 input files: https://research.stowers.org/mec/genrich72/

jsh58 commented 3 years ago

I am unable to download AC_LL_H_000.10.n.bam. I get only a truncated 62KB file.

malcook commented 3 years ago

apologies - can you please find them instead in https://ftp.stowers.org/#/public/mec/genrich72/

jsh58 commented 3 years ago

Thanks for the files. I was able to reproduce the error, diagnose the bug, and fix it. It occurred because the file AC_LL_H_000.3.n.bam had no unpaired alignments, and Genrich failed to recognize this and threw this error when trying to remove PCR duplicates from this nonexistent set.

I reran the updated Genrich on the input files, twice:

$ ~/Genrich/Genrich -t AC_LL_H_000.10.n.bam,AC_LL_H_000.3.n.bam -echrM -g300 -a5 -q0.05 -o run1.peak -jyrv
$ ~/Genrich/Genrich -t AC_LL_H_000.3.n.bam,AC_LL_H_000.10.n.bam -echrM -g300 -a5 -q0.05 -o run2.peak -jyrv

They produced identical outputs, as expected. Here is the verbose output:

Peak-calling parameters:
  Genome length: 1371702787bp
  Significance threshold: -log(q) > 1.301
  Min. AUC: 5.000
  Max. gap between sites: 300bp
Peaks identified: 1450 (681706bp)

Sorry for the difficulty. Thanks for letting me know and providing the files!

malcook commented 3 years ago

Fantastic!

Will you make another versioned release I can refer to in my METHODS?

Cheers,

Malcolm

jsh58 commented 3 years ago

Yes, I just made v0.6.1.