Open luizirber opened 4 years ago
For bgzip detection https://github.com/samtools/tabix/blob/master/bgzf.c#L66
https://github.com/gwihlidal/smush-rs has a different API (no Read
/Write
), but shares some of the ideas and supports more formats.
We can add smush-rs in readme and by inspired by it but can't use it at backend badly (maybe discuse with author to see if we can merge our project).
Brotli: https://github.com/dropbox/rust-brotli
Already have Read
and Write
traits implemented, should be easy to add
What about tar wrapped files and zips?
I'm not opposed in principle. But there are many problems:
How do we manage the multiple content files, we concatenate files? How do we determine the file order? How do we represent the folder structure? If we start managing all this, we're leaving the niffler scoop.
We could just do the format detection part. But for a tar file, for example, it's a wrapper around a file concatenation and the saving of the folder structure, then compressed in the chosen format. To detect that it's a tar file, you'd have to detect the compression, then open the file with the right parser and read the beginning to detect that it's a tar file. There's no technical impossibility in doing all this, but I'm not sure we're still on the niffler scoop.
Check what other crates are available and do the sniffing for additional formats. Some examples: https://stackoverflow.com/questions/19120676/how-to-detect-type-of-compression-used-on-the-file-if-no-file-extension-is-spe
(suggested by @chrisgulvik on slack, thanks!)