macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
695 stars 270 forks source link

"-1" Summit Values lead to "TypeError: unorderable types: NoneType() < int()" with IDR #215

Open hardigan opened 6 years ago

hardigan commented 6 years ago

Cross-posting this issue from nboley/idr repository, as I think the issue using IDR is related to the output from MACS2 call peak:

Trying to call peaks on an ATAC-seq sample (using the ENCODE ATAC-seq pipeline from kundajelab/atac_dnase_pipelines) which uses macs2 for peak calling.

My macs2 command is:

macs2 callpeak \
-t /gpfs/gpfs1/home/ATAC_SEQ/Sample_1.trim.PE2SE.nodup.tn5.tagAlign.gz\
-f BED \
-n "/gpfs/gpfs1/home/ATAC_SEQ/Sample_1.trim.PE2SE.nodup.tn5.pf"
-g "hs" \
-p 0.01 \
--nomodel --shift -37 --extsize 73 -B --SPMR --keep-dup all --call-summits 

Everything works fine on some samples, but occasionally during the pipeline (and if i run the steps individually myself) IDR after calling peaks will fail with the following error:

Traceback (most recent call last):
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/bin/idr", line 4, in <module>
    __import__('pkg_resources').run_script('idr==2.0.3', 'idr')
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/pkg_resources/__init__.py", line 748, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1517, in run_script
    exec(code, namespace, namespace)
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
    idr.idr.main()
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 840, in main
    merged_peaks, signal_type = load_samples(args)
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 760, in load_samples
    oracle_pks, args.use_nonoverlapping_peaks)
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 281, in merge_peaks
    use_nonoverlapping_peaks=use_nonoverlapping_peaks)
  File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 223, in merge_peaks_in_contig
    all_intervals.sort()
TypeError: unorderable types: int() < NoneType()

I found another user with the same issue after using MACS2 ( https://github.com/nboley/idr/issues/27), and it appears that the conversion of "-1" summit values called by macs2 is causing IDR to convert these summits to "None".

head -10 of macs2 call peak output file sorted with "-1" summit values:

chrom   chromStart  chromEnd    name         score  strand  signal  pvalue  qvalue  peak
chr6    143608889   143608962   Peak_367001 29  .   2.781642.96535  0.83482 -1
chr21   18864214    18864362    Peak_366686 29  .   2.781642.96535  0.83482 -1
chr20   36662599    36662672    Peak_366672 29  .   2.781642.96535  0.83482 -1
chr13   50516550    50516623    Peak_366393 29  .   2.781642.96535  0.83482 -1
chr18   28161128    28161201    Peak_367933 29  .   2.767592.93914  0.81626 -1
chr1    145154724   145154839   Peak_367698 29  .   2.767592.93914  0.81626 -1
chr7    121425112   121425231   Peak_369596 29  .   2.753682.91339  0.79701 -1
chr7    114996427   114996500   Peak_369591 29  .   2.753682.91339  0.79701 -1
chr4    74529892    74529965    Peak_369431 29  .   2.753682.91339  0.79701 -1
chr13   33310992    33311065    Peak_368979 29  .   2.753682.91339  0.79701 -1

gets imported into IDR with "-1" summit lines reading like as:

(Peak(chrm='chr1', strand='.', start=176983596, stop=176983716, signal=2.66479, summit=None, signalValue=2.65491, pValue=2.66479, qValue=0.72361), 0),

I suppose I am just wondering how a "-1" summit can arise when using macs2, i.e. if they are a meaningful value or just a "failed peak" that must be discarded before further analysis.

For reference, in a peak file of ~ 500k peaks, there are roughly 900 with "-1" summit values.

Is there anyone who has seen this issue or can provide some insight on its origin or how to deal with it?

Thanks!

amirshams84 commented 6 years ago

What I did is just converting any -1 to 0 with a sed command sed 's/-1/0/' in > out