mitreac / b575f19

UM DCMB BIOINF 575 Fall 2019 class repo
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

Under which circumstances will `read.pos` be negative? #16

Open CooperStansbury opened 5 years ago

CooperStansbury commented 5 years ago

I'm noticing that some of the reads from the tumor.bam file have a read.pos == -1. As far as I can tell, read.pos is defined when bam.py unpacks data (Exact line HERE):

self.refID, self.pos = _unpack_refId_pos(self._range_popper(8))

But I don't know enough to understand when this would return a -1 value. What conditions gove rise to -1 here?

Here's a test to reproduce this:

def get_positions(filename):
    bam = bs.AlignmentFile(filename)
    return [read.pos for read in bam]

tumor_pos  = get_positions('tumor.bam')
normal_pos  = get_positions('normal.bam')

print("T min {},T max {}".format(min(tumor_pos), max(tumor_pos)))
print("N min {},N max {}".format(min(normal_pos), max(normal_pos)))

print(sorted(tumor_pos)[:100])
betteridiot commented 5 years ago

These are cases when reads are not aligned or listed as supplementary.

With regard to the homework, I said not to worry about read quality fully expecting these results. However, if one were so inclined to worry about them the code for process_bam would look like this:

def process_bam(filename, sample_name, genome_positions = None):
    with bs.AlignmentFile(filename) as bam:
        for read in bam:

            # Filter out unmapped and supplementary reads:
            # https://broadinstitute.github.io/picard/explain-flags.html
            if not read.flag & 2052:
                # do your stuff here

    return genome_positions

Again, I am doing this for simplicity's sake and not require any QC, but the above code would alleviate the issue you brought up.

CooperStansbury commented 5 years ago

@betteridiot Cool beans. I was just curious. This can be resolved.

betteridiot commented 5 years ago

Additionally, one could just do a hard check for read.pos >= 0, but there are many definitions to read position and what read position is associated with.