nboley / idr

IDR
GNU General Public License v2.0
164 stars 45 forks source link

--rank option with arg 'p.value' raises a Value Error #22

Closed lzamparo closed 8 years ago

lzamparo commented 8 years ago

When I call idr thusly: idr --verbose --samples $1 $2 --input-file-type narrowPeak --rank p.value -o $outdir/$outfile 2>$outdir/idr-errors.txt

IDR raises a Value Error about the column I'm using to rank my peaks:

Loading the peak files Traceback (most recent call last): File "/Users/zamparol/anaconda/envs/py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-macosx-10.6-x86_64.egg/idr/idr.py", line 717, in load_samples signal_index = int(args.rank) - 1 ValueError: invalid literal for int() with base 10: 'p.value'

I'm trying to rank by p.value of narrowPeak files. The usage string (and other issues in this repo) suggest that --rank p.value is the proper way to indicate ranking by P value. Any ideas why I'm seeing this?

My version:

(py3) mski1743:day4 zamparol$ idr --version IDR 2.0.3 (py3) mski1743:day4 zamparol$ python --version Python 3.5.2 :: Continuum Analytics, Inc.

lzamparo commented 8 years ago

Btw $1, $2 resolve to narrowPeak files. I found the problem, it's in the parsing code at IDR/load_bed.py:

def load_samples(args):
    # decide what aggregation function to use for peaks that need to be merged
    idr.log("Loading the peak files", 'VERBOSE')
    if args.input_file_type in ['narrowPeak', 'broadPeak']:
        if args.rank == None: signal_type = 'signal.value'
        else: signal_type = args.rank

        try: 
            signal_index = {"score": 4, "signal.value": 6, 
                            "p.value": 7, "q.value": 8}[signal_type]
        except KeyError:
            raise ValueError(
                "Unrecognized signal type for {} filetype: '{}'".format(
                    args.input_file_type, signal_type))

        if args.peak_merge_method != None:
            peak_merge_fn = {
                "sum": sum, "avg": mean, "min": min, "max": max}[
                    args.peak_merge_method]
        elif signal_index in (4,6):
            peak_merge_fn = sum
        else:
            peak_merge_fn = min
        if args.input_file_type == 'narrowPeak':
            summit_index = 9    ### <--- this was causing it for me, since I throw away 
                                ### everything past column 8

I'll work on a PR that validates narrowPeak, broadPeak files.