mbhall88 / compression_benchmark

Benchmarking FASTQ compression with 'mature' compression algorithms
MIT License
30 stars 4 forks source link

Spring #7

Closed jsgounot closed 2 months ago

jsgounot commented 2 months ago

Hi,

I found this post a bit randomly but I found it quite interesting. I'd suggest adding Spring if there is a plan to make another iteration in the future. We can ignore NanoSpring as it is a lossy compressor compared to Spring. Spring works much better with Illumina reads compared to Nanopore.

By the way, I don't know if µbioinfo is a private or public group, but if it's public I'd be happy to join!

Have a nice day, JS

mbhall88 commented 2 months ago

Glad you found it interesting. And thank you for the suggestion.

Regarding adding spring to the benchmark, I'll first reiterate something I say in the readme

Don't get me wrong, there are plenty of benchmarks, but they're always looking at bioinformatics-specific tools for compressing sequencing data. Sure, these perform well, but every repository I went to was untouched in a while. When archiving data, the last thing I want is to try and decompress my data and the tool no longer installs/works on my system. In addition, I want the tool to be ubiquitous and mature. I know this is a lot of constraints, but hey, that's what I am interested in.

It does look like spring has some recent-ish activity, which is nice. But I guess it's more a question of will it be maintained in say 5 years time when I want to use it and potentially run into problems? As I say, I wanted this benchmark to focus more on the general-type compression algorithms rather than the specialised ones, which has plenty of benchmarking in the literature.

If you really want spring in the benchmark I would happily receive a pull request though.

mbhall88 commented 2 months ago

As for the µbioinfo slack channel, do you work in microbial bioinformatics?

jsgounot commented 2 months ago

Yes makes sense. Just as general info, I observe a compression ratio of 16 with Illumina reads, so similar to what you see with xz I assume.

As for the µbioinfo slack channel, do you work in microbial bioinformatics?

Yes, mostly metagenomic assembly and strain phasing lately.

mbhall88 commented 2 months ago

Cool! Is there an institutional email or another email I can use to invite you to the slack?

jsgounot commented 2 months ago

You can use my institutional email: jean-sebastien at gis.a-star.edu.sg. Thanks.