luizirber / niffler

Simple and transparent support for compressed files.
Apache License 2.0
75 stars 7 forks source link

Additional formats/detections #16

Open luizirber opened 4 years ago

luizirber commented 4 years ago

Check what other crates are available and do the sniffing for additional formats. Some examples: https://stackoverflow.com/questions/19120676/how-to-detect-type-of-compression-used-on-the-file-if-no-file-extension-is-spe

(suggested by @chrisgulvik on slack, thanks!)

natir commented 4 years ago

For bgzip detection https://github.com/samtools/tabix/blob/master/bgzf.c#L66

luizirber commented 4 years ago

https://github.com/gwihlidal/smush-rs has a different API (no Read/Write), but shares some of the ideas and supports more formats.

natir commented 4 years ago

We can add smush-rs in readme and by inspired by it but can't use it at backend badly (maybe discuse with author to see if we can merge our project).

luizirber commented 4 years ago

Brotli: https://github.com/dropbox/rust-brotli Already have Read and Write traits implemented, should be easy to add

dsully commented 1 year ago

What about tar wrapped files and zips?

natir commented 1 year ago

I'm not opposed in principle. But there are many problems:

How do we manage the multiple content files, we concatenate files? How do we determine the file order? How do we represent the folder structure? If we start managing all this, we're leaving the niffler scoop.

We could just do the format detection part. But for a tar file, for example, it's a wrapper around a file concatenation and the saving of the folder structure, then compressed in the chosen format. To detect that it's a tar file, you'd have to detect the compression, then open the file with the right parser and read the beginning to detect that it's a tar file. There's no technical impossibility in doing all this, but I'm not sure we're still on the niffler scoop.