I was wondering if anyone has any recommendations for using tombo resquiggle to allign fast5 files to a subset of a reference genome rather than mapping to a whole reference. In particular, we have been doing adaptive sampling for a species with a large genome to enrich for a small fraction of the genome and now want to map the data to our reference, but were hoping to map the data just to this small subset of the genome rather than the whole thing in order to save time. However, as one might expect we are finding that with the default parameters we are getting far more reads mapping to this subset of the genome than we expect (even accounting for good enrichment due to adaptive sampling), almost certainly because reads that would map better elsewhere in the genome are mapping ~ok~ to somewhere in our subset and getting mis-mapped there instead of getting thrown out.
To get around this, I see two options:
1) Just map to the whole reference genome. Hoping to avoid this, as it will be very time and computationally intensive.
2) Adjust the resquiggle/minimap parameters to only keep map reads that map much better than the default. My intuition is to simply lower the signal-matching-score, but I was wondering if there are other parameters that make sense to tweak as well to prevent off-target reads from mapping to our subset reference genome.
I was wondering if anyone has any recommendations for using tombo resquiggle to allign fast5 files to a subset of a reference genome rather than mapping to a whole reference. In particular, we have been doing adaptive sampling for a species with a large genome to enrich for a small fraction of the genome and now want to map the data to our reference, but were hoping to map the data just to this small subset of the genome rather than the whole thing in order to save time. However, as one might expect we are finding that with the default parameters we are getting far more reads mapping to this subset of the genome than we expect (even accounting for good enrichment due to adaptive sampling), almost certainly because reads that would map better elsewhere in the genome are mapping ~ok~ to somewhere in our subset and getting mis-mapped there instead of getting thrown out.
To get around this, I see two options:
1) Just map to the whole reference genome. Hoping to avoid this, as it will be very time and computationally intensive. 2) Adjust the resquiggle/minimap parameters to only keep map reads that map much better than the default. My intuition is to simply lower the signal-matching-score, but I was wondering if there are other parameters that make sense to tweak as well to prevent off-target reads from mapping to our subset reference genome.