vd4mmind / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Tabixfile.fetch() should raise more specialized exception when a region can not be found in tabix file #141

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The Tabixfile.fetch() method will raise a ValueError if called with a region 
(i.e., chromosome name) not contained in the tabix file. Unfortunately, this 
can cause ambiguous behavior in downstream code, as the following example shows:

def get_field_values(region, start, end):
    desired_field_values = []
    try:
        # The line below raises ValueError if region doesn't exist
        record_iter = tabixfile.fetch(
                region, start, end, parser=pysam.asTuple())
        for record in record_iter:
            # but the line below also raises ValueError if record[3]
            # can't be converted to an integer (e.g., int('purple'))
            field_desired = int(record[3])
            desired_field_values.append(field_desired)
    # In the line below, I *think* that I'm returning an empty list
    # because there were no entries in the tabix file, but I could also
    # be accidentally catching a ValueError from int(record[3])!!
    except ValueError as e:
        pass    # there were no records in this region; return an empty list
    return desired_field_values

The use of ValueError is insufficient for distinguishing the case where a tabix 
doesn't have any records for the region I'm trying to fetch from the case where 
the contents of the fourth field in the tabix file do not meet my expectations 
of being an integer.

Tabixfile.fetch() should return a more specific error so that users know that 
when they catch it, they are only catching errors raised when a region doesn't 
exist in the tabix file. I would recommend calling it RegionNotFound because it 
is explicit in what the problem is.

For backwards compatibility, you can subclass RegionNotFound from ValueError. 
At least in this case one can distinguish between fetch not finding any region 
of the given name from a generic ValueError thrown elsewhere. Subclassing 
ValueError still convolutes catching RegionNotFound the user actually only 
wanted to catch a ValueError raised elsewhere.

Original issue reported on code.google.com by chris.la...@gmail.com on 24 Oct 2013 at 8:23

GoogleCodeExporter commented 8 years ago
Hi Chris,

thank you for your suggestion. I agree that the errors could be more specific.

As I currently see, tabix works as follows:

1. Empty region: No error
2. Invalid region (negative coordinate, end < start): ValueError
3. Unknown chromosome: ValueError
4. Region out of range: ValueError

Viewing a Tabixfile as a container - would the following be consistent:

1. Empty region: no error
2. Invalid region: ValueError
3. Unknown chromosme: KeyError
4. Region out of range: IndexError

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 4 Nov 2013 at 8:11

GoogleCodeExporter commented 8 years ago
Hi Chris,

I have implemented the changes proposed above.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 26 Nov 2013 at 9:33