mbhall88 / rasusa

Randomly subsample sequencing reads or alignments
https://doi.org/10.21105/joss.03941
MIT License
203 stars 17 forks source link

Support for subsampling alignment to uniform coverage #17

Closed IsmailM closed 4 months ago

IsmailM commented 4 years ago

Hey,

Great tool.

Are there any plans to support Bam files (which would then ideally output a downsampled bam file)?

At the moment, if I want to do this, I would need to:

  1. convert BAM to fasta (using samtools fasta -F 4)
  2. downsample with Rasusa
  3. use the read ids in the downsampled fasta to filter my BAM

As such would be a lot easier if Rausa could support BAM files :)

mbhall88 commented 4 years ago

Hi @IsmailM

I'm glad you're finding the tool useful.

Great question. I have some reservations around supporting BAM files as they are not quite as straightforward as fastq/a. For instance, there is the issue of reads having multiple entries in a BAM if there are secondary/supplementary alignments. I.e if the random subsample chooses a secondary alignment entry, should it also have to keep the primary alignment entry?

In the meantime, as you say, your workaround would be the best solution. The added benefit of your solution is that you can apply filtering via samtools prior to feeding into rasusa. As I have mentioned elsewhere, it is not my intention to introduce any kind of filtering options for filetypes in rasusa. The reason for this is that the tool would not strictly be taking a random subsample then. As such, even if I were to implement BAM support you would likely still end up needing to do at least steps 1 and 2 from your current workflow.

Thank you for the feature request nonetheless. If, after discussions, we decide BAM support is not going to happen, I would still very much appreciate input on a code snippet I could add to the README for others trying to do the same thing as you.

mbhall88 commented 3 years ago

@IsmailM I just came across VariantBam, which seems to do what you're after I think?

eesiribloom commented 4 months ago

I would also appreciate input on the code snippet for how to downsample from a bam and end up with a bam again (without re-aligning) :)

mbhall88 commented 4 months ago

Coincidentally, I have been thinking about this feature lately. Depending on how I go over the next few weeks I may look at implementing this feature.

mbhall88 commented 4 months ago

Okay, this is implemented in v1.0.0 in the subcommand aln. Please try it out and report any issues