mbhall88 / rasusa

Randomly subsample sequencing reads or alignments
https://doi.org/10.21105/joss.03941
MIT License
203 stars 17 forks source link

Compress output when requested #27

Closed mbhall88 closed 3 years ago

mbhall88 commented 3 years ago

Hey @natir.

Yes, I only just switched the parsing to needletail. Given I only just switched the parser I probably won't get around to switching it again anytime soon. Also, compile time isn't a major concern for me, especially since I distribute pre-compiled binaries and a bunch of other methods that mean users don't need to compile the project. I'm happy to review a PR with updated benchmark though.

Regarding niffler, you've made me realise somewhere along the line I have lost the compressed output functionality of this tool... Originally rasusa would infer the desired output compression from the path. I'll have to fix that.

Originally posted by @mbhall88 in https://github.com/mbhall88/rasusa/issues/25#issuecomment-898766604

It might also be a good idea to add a flag to allow the user to set the compression level also.

natir commented 3 years ago

I think, keeping the same compression format as input is a good default behavior for the user, but add option to choose another one is important too.

Audald commented 3 years ago

Hello @mbhall88, thanks for developing and maintaining rasusa.

Sorry for stepping in, I am not sure whether this is the right thread or my issue is related to what is being discussed here. According to the documentation, the output can be automatically compressed if .gz is stated in the output path during submission. However, the resulting paired-end fastq files are uncompressed. Am I missing any argument/flag?

This is the code I am using:

audald/software/miniconda3/bin/rasusa -i sample_R1.fastq.gz sample_R2.fastq.gz --coverage 0.25 --genome-size 2715853792b -o sample_out_R1.fastq.gz sample_out_R2.fastq.gz -s 189

Thanks in advance!

mbhall88 commented 3 years ago

Hey @Audald, yes this is the exact problem this issue describes. I don't know how I removed this functionality. I will get a fix out ASAP sorry.

Audald commented 3 years ago

Thanks for your prompt answer, @mbhall88. I can bypass the issue by adding a bgzip step in my pipeline. Therefore, there is no urgency from my end. Best regards and thanks again for the great work.

mbhall88 commented 3 years ago

This should now be fixed in version 0.5.0. I've also added some new compression CLI options