google / brotli

Brotli compression format
MIT License
13.56k stars 1.24k forks source link

Add brotli mime type detection to file and libmagic #727

Open stokito opened 6 years ago

stokito commented 6 years ago

The Linux file utility doesn't recognize brotli MIME type:

$ file -i some_archive.br
some_archive.br: application/octet-stream; charset=binary

I know, this is not related for the brotli itself but could you send a PR to file and libmagic so they start to recognize brotli mime type?

https://github.com/file/file/blob/master/src/compress.c https://github.com/file/file/blob/master/magic/Magdir/archive

Also please confirm the correct brotli mime type #724

eustas commented 6 years ago

There is a little problem. Most utilities scan first file bytes and compare them against "magic" bytes. Well,... for brotli 254 of 256 byte values are valid as the first file byte.

Brotli was designed to be "stream" format rather than file format. There is a "brotli-framing-format" project that adds tons of features (including "magic header bytes")... but it is not public yet.

stokito commented 6 years ago

If I understood correctly gzip is also can be named as stream format but this is not a problem for them to add few magic bytes.
I already used \xce\xb2\xcf\x81 as brotli magic bytes and application/x-brotli as mime type but asked here #724 to confirm.

Where can I read at least something about this framing format? Will we still be able to use regular .br files?

eustas commented 6 years ago

gzip is a wrapper for deflate streams

stokito commented 5 years ago

So, the magic bytes which I used are actually from another framing format from #462

doronbehar commented 5 years ago

I've opened an issue about this in file's bug tracker:

https://bugs.astron.com/view.php?id=111

So @eustas, the issue is that this implementation doesn't include magic bytes when it creates files and not streams?

unphased commented 3 months ago

What is the state of this? Most (but not all! which is not looking promising) of the brotli files i looked at start with the byte 5b...