Open dr-yoon opened 1 year ago
Hey! Good idea: do you mean for the output only? It seems that there is a bit of a trade off between compression levels and runtime: https://medium.com/@acarroll.dna/looking-at-trade-offs-in-compression-levels-for-genomics-tools-eec2834e8b94
If you run the pipeline with default settings Markduplicates followed by samtools view
takes care of the conversion.
Do you know which view
flag it is, I don't see it in the docs: http://www.htslib.org/doc/samtools-view.html 😱
Wow, what a super-fast reply! 😱
I think we can tweak this option described here: http://www.htslib.org/doc/samtools.html
level=INT Output only. Specifies the compression level from 1 to 9, or 0 for uncompressed. If the output format is SAM, this also enables BGZF compression, otherwise SAM defaults to uncompressed.
😄 exactly the moment I checked my emails.
Ah great, I was checking directly in the subcommand. Sure that should be no problem to add :)
Description of feature
Hi. I appreciate your effort in developing this pipeline. I noticed that the cram file size generated by Sarek is much larger than the one from another pipeline. For example, an identical sample (WGS) produces a cram file of approximately 30 GB (Sarek) compared to 10 GB (other pipeline) in size. It would be nice if we could adjust the compression level of the resulting cram files for more efficient storage. Thank you :)