Cross-posting this issue from nboley/idr repository, as I think the issue using IDR is related to the output from MACS2 call peak:
Trying to call peaks on an ATAC-seq sample (using the ENCODE ATAC-seq pipeline from kundajelab/atac_dnase_pipelines) which uses macs2 for peak calling.
Everything works fine on some samples, but occasionally during the pipeline (and if i run the steps individually myself) IDR after calling peaks will fail with the following error:
Traceback (most recent call last):
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/bin/idr", line 4, in <module>
__import__('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/pkg_resources/__init__.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1517, in run_script
exec(code, namespace, namespace)
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
idr.idr.main()
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 840, in main
merged_peaks, signal_type = load_samples(args)
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 760, in load_samples
oracle_pks, args.use_nonoverlapping_peaks)
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 281, in merge_peaks
use_nonoverlapping_peaks=use_nonoverlapping_peaks)
File "/gpfs/gpfs1/home/miniconda3/envs/bds_atac_py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-linux-x86_64.egg/idr/idr.py", line 223, in merge_peaks_in_contig
all_intervals.sort()
TypeError: unorderable types: int() < NoneType()
I found another user with the same issue after using MACS2 ( https://github.com/nboley/idr/issues/27), and it appears that the conversion of "-1" summit values called by macs2 is causing IDR to convert these summits to "None".
head -10 of macs2 call peak output file sorted with "-1" summit values:
I suppose I am just wondering how a "-1" summit can arise when using macs2, i.e. if they are a meaningful value or just a "failed peak" that must be discarded before further analysis.
For reference, in a peak file of ~ 500k peaks, there are roughly 900 with "-1" summit values.
Is there anyone who has seen this issue or can provide some insight on its origin or how to deal with it?
Cross-posting this issue from nboley/idr repository, as I think the issue using IDR is related to the output from MACS2 call peak:
Trying to call peaks on an ATAC-seq sample (using the ENCODE ATAC-seq pipeline from kundajelab/atac_dnase_pipelines) which uses macs2 for peak calling.
My macs2 command is:
Everything works fine on some samples, but occasionally during the pipeline (and if i run the steps individually myself) IDR after calling peaks will fail with the following error:
I found another user with the same issue after using MACS2 ( https://github.com/nboley/idr/issues/27), and it appears that the conversion of "-1" summit values called by macs2 is causing IDR to convert these summits to "None".
head -10 of macs2 call peak output file sorted with "-1" summit values:
gets imported into IDR with "-1" summit lines reading like as:
(Peak(chrm='chr1', strand='.', start=176983596, stop=176983716, signal=2.66479, summit=None, signalValue=2.65491, pValue=2.66479, qValue=0.72361), 0),
I suppose I am just wondering how a "-1" summit can arise when using macs2, i.e. if they are a meaningful value or just a "failed peak" that must be discarded before further analysis.
For reference, in a peak file of ~ 500k peaks, there are roughly 900 with "-1" summit values.
Is there anyone who has seen this issue or can provide some insight on its origin or how to deal with it?
Thanks!