mirage / decompress

Pure OCaml implementation of Zlib.
MIT License
115 stars 21 forks source link

Exception raised when flushing buffers #120

Closed igarnier closed 3 years ago

igarnier commented 3 years ago

Hello, I'm using decompress to perform Gz streaming compression of some ascii log. When killing my soft, I've a cleanup function that tries to gracefully empty up the internal buffers of decompress. Note that the gzip file produced was rather large at this stage (roughly 650 megabytes).

I noticed the following exception was raised:

Invalid_argument                                                                                                
    "Optint.to_int32: 13903751204 can not fit into a 32 bits integer")        

and indeed my gzip file is truncated at the end (checked using gzip -t), meaning that there was an error during this flushing down. Here's my flushing function (mostly inspired by the code for the pipe utility as you might notice):

      let rec flush_encoder oc encoder =
        let open Gz in
        match Def.encode encoder with
        | `Await encoder ->
            (* A length of 0 signals end of input (I think). *)
            flush_encoder oc (Def.src encoder i 0 0)
        | `Flush encoder ->
            let len = io_buffer_size - Def.dst_rem encoder in
            bigstring_output oc o 0 len ;
            flush_encoder oc (Def.dst encoder o 0 io_buffer_size)
        | `End encoder ->
            let len = io_buffer_size - Def.dst_rem encoder in
            if len > 0 then bigstring_output oc o 0 len

Is this a bug, or am I doing something wrong with the API?

dinosaure commented 3 years ago

Hi, can you test your input with this PR: #121. A recent release of optint changes a bit the semantic of some function such as of_int/of_int32. This PR should safely cast values and you should not have such exception then.

dinosaure commented 3 years ago

And the code seems fine as long as you want to process only one chunk of i 👍 .

igarnier commented 3 years ago

Hi, can you test your input with this PR: #121. A recent release of optint changes a bit the semantic of some function such as of_int/of_int32. This PR should safely cast values and you should not have such exception then.

Thanks a lot! I'll try this out now.

And the code seems fine as long as you want to process only one chunk of i

This is the code I call when terminating. Can there be more than one chunk being processed at a time?

dinosaure commented 3 years ago

Yes, you need to have a refill function such as:

val refill : 'fd -> bigstring -> int -> int -> int

This function is pretty-close to [Stdlib.input]. On the Await state, decompress consumed all your input and ask to have more bytes. So in that case, you can write:

| `Await encoder ->
  let wrote = refill fd i 0 (Bigstringaf.length i) in
  flush_encoder oc (Def.src encoder i 0 wrote)

It permits to use to limit our memory consumption on:

You have an example in bin/pipe.ml which is a pipe compressor/decompressor.

igarnier commented 3 years ago

I still have a similar error on shutdown (the .gz is ~400M):

Error:
              (Invalid_argument
                "Optint.to_unsigned_int32: 8597664831 can not fit into a 32 bits integer")
igarnier commented 3 years ago

I've created an example reproducing the issue here https://gist.github.com/igarnier/cc4942547781dcc8dcf720e349387cb8

Here's a more informative backtrace

Fatal error: exception Invalid_argument("Optint.to_unsigned_int32: 4294967334 can not fit into a 32 bits integer")
Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
Called from Gz.Def.checksum.k in file "lib/gz.ml", line 719, characters 18-47
Called from Gz.Def.encode in file "lib/gz.ml" (inlined), line 916, characters 17-22
Called from Dune__exe__Decompress_test.flush_encoder in file "decompress_test.ml", line 40, characters 8-26
Called from Dune__exe__Decompress_test in file "decompress_test.ml", line 69, characters 2-63
dinosaure commented 3 years ago

I still have a similar error on shutdown (the .gz is ~400M):

Ah I was think that the error is about checksum but it's about how many bytes we wrote. I added a commit in the same PR which should fix the error: see 74e9139. Can you retest on your side? I will try to find a reproductible case.

igarnier commented 3 years ago

I just tested with the example in https://gist.github.com/igarnier/cc4942547781dcc8dcf720e349387cb8, it works now! Thanks a lot.

I'd like to add, great work with decompress! I'm glad we have this in OCaml. :heart:

dinosaure commented 3 years ago

Thanks 👍, I will cut a release next week (including several things such as the non-stream API).