marcelm / dnaio

Efficiently read and write sequencing data from Python
https://dnaio.readthedocs.io/
MIT License
61 stars 9 forks source link

Highlight xopen's Zstandard support when optionally installed #123

Closed bede closed 1 year ago

bede commented 1 year ago

Hi, what a fantastically useful library this is! I particularly like that it supports stdin and Zstandard compression.

This tiny PR highlights underlying xopen's (and therefore dnaio's) existing support for .zst extensions, mentioning that either of two dependencies (zstd CLI or Zstandard PyPI package) must be installed to make it work.

bede commented 1 year ago

Indeed there was, apologies!

bede commented 1 year ago

(I don't know if the limitations section is the best place to put the notice about Zstandard dependencies, what do you think?)

marcelm commented 1 year ago

Thanks! This should definitely be mentioned in the docs. It should probably not be under "Limitations". The idea is that Zstandard support is optional: If you want it, you need to use pip install dnaio zstandard.

We delegate compression/decompression to https://github.com/pycompression/xopen/, which also has zstandard as an optional dependency. The reason for it being optional is that the zstandard wheels are quite large. See https://github.com/pycompression/xopen/pull/91#issuecomment-1287495260 (and the subsequent comment). The numbers have gotten even worse: The wheel is 5.2 MB and it needs 23 MB on disk unpacked. This could be 800 kB (2 MB unpacked) if debugging symbols were removed.

bede commented 1 year ago

Thanks, interesting, and a pity about the wheel sizes for Zstandard. Would you suggest putting the Zstandard install information (taken directly from xopen's readme) instead in an additional line beneath the compressed input and output… line of the features section? (Moved from limitations).

marcelm commented 1 year ago

How about adding an "Installation" section after "Example usage"? Just a short sentence mentioning pip install dnaio zstandard and say that zstandard can be omitted if support for .zst files is not needed.

And an addition to the "Compressed input and output" bullet point would also be good.

bede commented 1 year ago

Thanks, I've added an installation section as suggested, thoughts?

marcelm commented 1 year ago

Looks good now, thanks a lot!

bede commented 1 year ago

And thanks for developing this great library!