nboley / idr

IDR
GNU General Public License v2.0
168 stars 46 forks source link

--input-file-type bed expecting 9 columns #65

Open perinom opened 1 year ago

perinom commented 1 year ago

Hello,

I'm trying to run IDR (v IDR 2.0.4.2, latest on Conda) with a 6-columns bed file (extended peak summits from MACS2) using --input-file-type bed --rank 5 but it fails with

[...]
File "mydir/snakemake/02/.snakemake/conda/ee8bd795864fb92a3b69fa9e4029ccf7_/lib/python3.7/site-packages/idr/idr.py", line 65, in load_bed
    float(data[6]), float(data[7]), float(data[8])
IndexError: list index out of range

It seems to me IDR is trying to read a 9 column file (like a narrowPeak), and failing. Editing the .bed by adding 3 empty columns and shifting column 5 (with bed scores) to column 7 (narrowPeaks score) works, but it's not the cleanest of the workarounds.

Running IDR with the narrowPeak file from the same MACS call the summits are coming from works, while it fails with the same error with any other 6-column bed I tried.

Am I missing something?

Thanks, Matteo

perinom commented 1 year ago

yes, indeed.

load_bed() is trying to format bed lines with namedtuple() of fixed length with which isn't flexible for bedfile format.

Peak = namedtuple(
    'Peak', ['chrm', 'strand', 'start', 'stop', 'signal', 'summit', 'signalValue', 'pValue', 'qValue'])

I'm not sure whether IDR later uses signal (which corresponds to the column index given with --rank or signalValue