yjh0502 / erl-brotli

Brotli encoder NIF for Erlang
Other
29 stars 17 forks source link

Error on encoding large binaries #11

Closed bblaszkow06 closed 2 years ago

bblaszkow06 commented 2 years ago

Using version v0.3.0 the library returns an :error atom when trying to encode larger binary (in my case somewhere between 1M and 2M). This breaks https://github.com/hauleth/phoenix_bakery when a large JS bundle is compressed

Steps to reproduce (in Elixir)

First create a file

head -c 10M </dev/urandom >myfile   

then

iex(1)> Mix.install([:brotli])
:ok
iex(2)> f=File.read!("myfile")                                  
<<120, 120, 3, 37, 247, 145, 138, 157, 219, 235, 102, 30, 192, 84, 58, 149, 19,
  72, 76, 103, 47, 203, 12, 251, 96, 128, 89, 26, 4, 194, 36, 47, 46, 161, 11,
  144, 36, 248, 204, 55, 28, 221, 57, 160, 156, 106, 15, 38, 149, 242, ...>>
iex(3)> res = :brotli.encode(binary_part(f, 0, 1*1024*1024))    
{:ok,
 <<171, 255, 255, 15, 120, 120, 3, 37, 247, 145, 138, 157, 219, 235, 102, 30,
   192, 84, 58, 149, 19, 72, 76, 103, 47, 203, 12, 251, 96, 128, 89, 26, 4, 194,
   36, 47, 46, 161, 11, 144, 36, 248, 204, 55, 28, 221, 57, 160, ...>>}
iex(4)> res = :brotli.encode(binary_part(f, 0, 2*1024*1024))
:error

Suspected reason

The responsible atom comes from here: https://github.com/yjh0502/erl-brotli/blob/9582135327ca81d420b5ccb5ba9eff871a49076c/src/brotli.erl#L65 My guess is that the encoder still has some data left and :brotli_nif.encoder_take_output has to be called more than once. From Brotli docs:

Finishing the stream means encoding of all input passed to encoder and adding specific "final" marks, so stream decoder could determine that stream is complete. To perform finish set op to BROTLI_OPERATION_FINISH. Under some circumstances (e.g. lack of output stream capacity) this operation would require several calls to BrotliEncoderCompressStream. The method must be called again until both input stream is depleted and encoder has no more output (see BrotliEncoderHasMoreOutput) after the method is called.

hauleth commented 2 years ago

Will look into that.

hauleth commented 2 years ago

I think that the problem is different - brotli:encode/1 can be used with binaries that are at most 1.25 MiB (that is 1310721 bytes). Question is how we should handle such case. Whether disallow passing such huge binaries at once and force user to use streaming parsing, or rather "internally" split the input into smaller chunks and parse them separately.

marmor157 commented 2 years ago

For brotli:encode/1 I would suggest using BrotliEncoderCompress as it performs one-shot memory-to-memory compression, and is essentialy what brotli:encode/1 is now, but also add brotli:encode_stream which would internally use BrotliEncoderCompressStream and would allow to handle output in chunks. But to do that there would have to be a way to handleavailable_in, next_in, next_out and total_out which are used to resume compression

hauleth commented 2 years ago

@marmor157 the problem is that BrotliEncoderCompress supports only small set of the options that brotli:encode/2 supports. Technically I could do some runtime checks to pick proper implementation, however that was easier.

hauleth commented 2 years ago

@marmor157 another reason why I prefer using Erlang implementation for compressing via brotli:encode/{1,2} is that it create less stress on the schedulers (even dirty). Having NIF that will run few seconds can be problematic even with dirty schedulers.

bblaszkow06 commented 2 years ago

Awesome, thanks @hauleth for the quick fix & update.