mgymrek / pybamview

Browser based application for viewing bam alignments
MIT License
56 stars 17 forks source link

Exper #23

Closed dlrice closed 10 years ago

dlrice commented 10 years ago

A minor modification to parsing the region user input. I noticed that upon first load "chr1:0" is in the region text box and so I attempted to visit "chr16:48000281" which showed no bases, while "16:48000281" showed me some bam. To fix this, I added a regular expression to parse the input and remove any leading non-digit characters, space and also added some more splitting characters just incase.

mgymrek commented 10 years ago

Thanks for looking into this! Several comments:

I’m not sure this is the desired behavior: right now the way pybamview knows which reads to display and which reference bases to display is by matching the exact string of the entered chromosome to the chromosome listed in the BAM file or fasta file, respectively. Unfortunately, some versions of reference genomes are annotated differently. For instance, some versions of the human reference genome have “chr16” whereas otheras have “16”. If you are looking at a non-human organism or something for which you have built a custom reference, the chromosome names can be even more complicated.

Using the regular expressions here, if my BAM actually has “chr16” rather than “16”, and I enter “chr16:48000281”, no reads will get displayed since it will try to match “16” and find no reads. Similarly if the fasta reference has “>chr16” rather than “16”, no reference bases will get displayed. Therefore, the chrom entered for the region has to match exactly to what’s in the BAM file, and if using a reference genome, exactly to what’s in the reference fasta.

However, you make a good point, that setting the default to chr1:0 is misleading, since users will assume chromosome names looking something like “chr1” rather than “1”, etc. A better default would be to peek at the BAM file to see what chromosomes are actually present in the file, and use one of those as the default starting position. However it would need to handle cases in which it doesn’t find any chromosomes (e.g. none are listed in the header and all reads are unaligned) and choose some reasonable default in that case.

Let me know if that makes sense.

dlrice commented 10 years ago

Yes that makes a lot of sense, thanks for explaining that. I totally agree with the suggested improvement of actually looking at the BAM file to see what is at the first read (if anything) as discussed in #27.

mgymrek commented 10 years ago

Great. I addressed this in #29 so I'm closing this. Thanks again for pointing it out.