pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
303 stars 82 forks source link

source_read_cb needs ability to pass source identification #24

Closed madsci77 closed 5 years ago

madsci77 commented 5 years ago

When using streaming decompression, the uzlib_uncomp structure passed to the callback contains nothing that can be easily used to associate the callback with a particular data source. If you're using a single process that always accesses a single data source this isn't a problem, but to make it reentrant it could really use a user-defined pointer.

I fixed this in my usage by adding a void * to the struct that the application can set to the corresponding file pointer, or to a pointer to a context structure defining the source.

If there's a supported way to accomplish this in the existing code, I don't see it.

pfalcon commented 5 years ago

Just use standard pattern of including struct uzlib_uncomp in a larger structure, then casting pointer received in a callback to that larger structure. See e.g. https://github.com/pfalcon/micropython/blob/pfalcon/extmod/moduzlib.c#L52

madsci77 commented 5 years ago

Thanks for the quick response. Think you could put that in the docs, if it's the recommended approach? Without reading through the code, the user has no guarantee that it's safe - the decompressor could be passing it something other than the struct the user initialized. The documentation doesn't even mention the signature of the callback function, leaving the user to get that from the header file, and there's no callback example provided. One would expect that it's probably going to simply pass a pointer to the original structure, but in the absence of an explicit guarantee, a responsible programmer has to review the calling code to be sure they're never going to be passed something else, like a temporary copy of a struct. And the same review has to happen if the library is updated.

On the subject of callback documentation, there's this comment in the header:

/* If source_limit == NULL, or source >= source_limit, this function
   will be used to read next byte from source stream. The function may
   also return -1 in case of EOF (or irrecoverable error). Note that
   besides returning the next byte, it may also update source and
   source_limit fields, thus allowing for buffered operation. */

I think this is saying that it's acceptable for the callback function to copy a block of data into the source buffer and adjust the source and source_limit fields to indicate to the caller what's been filled. Again, that's an assumption that's not safe enough to base code on without a review. Presumably the normal behavior of the library is to increment the source pointer itself; hopefully that's done before the callback, or the callback will have to take that into account. The calling code would also have to be reviewed to make sure there aren't other requirements - for example, it's easy to imagine that the decompressor would call the callback while there are still pending bytes in the current source buffer, and that replacing the source buffer entirely with a new block would cause it to fail.

An example and/or a little bit more explanation would go a long way.