Closed andreas-wilm closed 8 years ago
Hey Andreas! We force the bases to be uppercase, otherwise we consider them as N's and just skip them. This was a design choice. Why would you have lowercase bases in your reads? Usually these are used for masking regions in reference sequences.
Ivan
Yes, lowercase is used in the reference for masking. But not in the reads. It just so happens that dextract (https://github.com/thegenemyers/DEXTRACTOR) outputs lower case reads and there's technically nothing wrong with it.
Ok, I see, interesting. But what if you're aligning two references with masking regions (one is the 'reference' and the other a 'read')? A mapper can't differentiate between those without additional command line parameters. It sounds like this should be a preprocessing step - converting all bases to upper caps?
Good point. I fear though that some people might run into the same problem as I but without knowing what's causing it. How about making reads.upper() the default adding a --no-auto-upper option?
Since lower-case fastq files should be the exception and your point about reference alignment is valid, how about a warning message should a read be lower case. All lower-case doesn't make sense in any setting if used for masking.
The new version (v0.3.0) now converts all sequences to upper-case by default (there are no special command line parameters to turn this on or off).
@isovic Sorry to revive this old ticket but I found that GraphMap (v0.5.2) still treats lowercase nucleotides differently from uppercase. I tested alignments before and after converting the input reads to uppercase and found a substantial difference in the output. I would have expected them to be identical given what you've stated above. Just thought I'd let you know!
Hi Ivan,
I tried to map a (PacBio) FastQ file with lower case reads (produced by dextract) but none of these reads were mapped by Graphmap (as opposed to Blasr). If I uppercase them, all map. I think I saw an uppercase function for indexing of the reference. But what about reads?
Andreas