jpiper / pyDNase

Python module for the easy handling and analysis of DNase-seq data
http://jpiper.github.io/pyDNase
MIT License
37 stars 24 forks source link

alignedread.aend is None sometimes and wellington_footprints.py crashes #16

Closed JohnReid closed 8 years ago

JohnReid commented 8 years ago

Hi, I have been running into a problem where alignedread.aend returns None occasionally and wellington_footprints.py exists with the following stack trace:

+ wellington_footprints.py -fdrlimit -5 /home/jer15/Dev/Saturn/Data/DNASE/peaks/relaxed/DNASE.K562.relaxed.bed /home/jer15/Dev/Saturn/Data/DNASE/bams/DNASE.K562.biorep2.techrep16.bam /home/jer15/Dev/Saturn/Data/DNASE/bams/footprints/K562/relaxed/2/16/                                           
Reading BED File...                                                                                                                                                                                                                                                                                   
Calculating footprints...                                                                                                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                                                                                                                    
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/bin/wellington_footprints.py", line 150, in <module>                                                                                                                                                                                             
    multiWellington(orderedbychr,reads, shoulder_sizes = clargs.shoulder_sizes ,footprint_sizes = clargs.footprint_sizes, FDR_cutoff=clargs.FDR_cutoff,FDR_iterations=clargs.FDR_iterations,bonferroni = clargs.bonferroni)                                                                           
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/bin/wellington_footprints.py", line 140, in multiWellington                                                                                                                                                                                      
    fp = footprinting.wellington(i,reads,**kwargs)                                                                                                                                                                                                                                                    
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/footprinting/__init__.py", line 46, in __init__                                                                                                                                                              
    self.reads        = reads[self.interval]                                                                                                                                                                                                                                                          
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 192, in __getitem__                                                                                                                                                                       
    return self.get_cut_values(vals)                                                                                                                                                                                                                                                                  
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 182, in get_cut_values                                                                                                                                                                    
    retval = self.__lookupReadsWithoutCache(startbp,endbp,chrom)                                                                                                                                                                                                                                      
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 122, in __lookupReadsWithoutCache                                                                                                                                                         
    a = int(alignedread.aend)                                                                                                                                                                                                                                                                         
TypeError: int() argument must be a string or a number, not 'NoneType'

The docs for pysam say

aligned reference position of the read on the reference genome.

reference_end points to one past the last aligned residue. Returns None if not available (read is unmapped or no cigar alignment present).

I thought all the reads were aligned but I could be wrong (I am a bit new to read mapping and DNase-seq and the data has come from a third party).

Is there any way for wellington_footprints.py to handle these problems gracefully rather than crashing? Can it just ignore these reads? This only seems to happen for reads mapped to the reverse strand: alignedread.pos is never None.

And thanks for your nice software :)

jpiper commented 8 years ago

Thanks for the bug report John!

I've noticed that pySAM has actually deprecated aend, so I'll fix that as well so pyDNase doesn't suddenly stop working.

By default, pyDNase just takes all the reads provided to it in the BAM - letting the user decide whether badly mapped/secondary alignments should be utilised. I (stupidly) never handled the situation where unmapped reads may be present in the BAM, and likewise, because not a lot of people are doing paired end alignment for DNase-seq (at least, none of the public data is), I didn't handle ignoring non-primary paired alignments.

For the minimum, I've changed it to only process mapped reads (see 4aca97feefe019f7103c0ec14f7b65844c8d359c) and I'll try to release this to master today, which should fix your issue.

For paired end reads, there's a lot of considerations. I'm thinking a command line flag to ignore secondary alignments or not, but I'll think about this a bit more before implementing something here -opinions and input welcome!

JohnReid commented 8 years ago

Great thanks for the quick response! I can't say I have any opinions on the secondary alignments as I'm a bit new to this read mapping and still trying to get the hang of it. If I suddenly form some strong opinions I'll be sure to let you know!