Add "pull-model" interface to decoder with a limited output buffer size

GoogleCodeExporter commented 9 years ago

The current version of the open-vcdiff decoder API can execute an
arbitrary-sized "append" operation on its OutputString -- at least up to
64MB, which is the limit on target window size.  See issue #8, which
suggests placing stricter limits on this allocation.

The author of the application that calls the decoder may wish to place a
limit on the amount of decoded data that can be returned to it after any
one call to DecodeChunk(), and so prevent the decoder from dynamically
allocating large amounts of memory.

Proposal: Add a new API to the decoder which places its output into a
supplied output buffer, and never exceeds the capacity of that buffer.

The new API will return the number of bytes that were processed from the
input chunk (which will be <= the input size), and also the number of bytes
that were placed into the output buffer (which will be <= the size of the
output buffer.)

The API might look something like:

DecodeChunkToBuffer(const char** source_buffer, size_t*
remaining_source_length, char* destination_buffer, size_t* destination_length)

DecodeChunkToBuffer will read some (not necessarily all) data from the
source buffer, writing expansion to the destination_buffer, but not
exceeding the supplied destination_length.  

When the function returns, the parameters may have been modified.  The
source_buffer values should be updated to reflect what data has been
processed (advancing the pointer, and diminishing the length).  If all data
was not processed, then the function can be called again with the
remaining, or relocated source buffer (when more destination_buffer space
is made available) to process additional data.

The supplied value of *destination_length is the size of the supplied
destination_buffer, and the output value is the number of bytes actually
written to the supplied destination_buffer.

The interface details may vary from this example, but the critical point is
that the caller can specify how much output data to emit.  As a result, no
more than a single "block" of the decoded target file will ever have to
reside in the application's memory space.

It will be possible to fill the output buffer before processing all the
input data.  If the output buffer becomes full, the number of input bytes
processed will be smaller than the input size, and the caller will be in
charge of conserving the unprocessed input bytes and passing them along
with the next call to DecodeChunk.

The decoder may have to alter the input string to change the last
instruction's size and address.  Any instruction can be only partially
processed; for example, an COPY instruction for 5K bytes with an output
buffer of only 4K bytes.  In this case, the remaining COPY size will be
decremented to 1K and its address will be moved forward by 4K.  This may
change the size of the instruction in the input stream -- even increasing
its size in the (admittedly contrived) example that the original size had
an opcode with an implicit size value and the decremented size did not.

Original issue reported on code.google.com by openvcd...@gmail.com on 12 Sep 2008 at 6:40

GoogleCodeExporter commented 9 years ago

There is a bigger issue than unlimited appends on OutputString -- it is, after 
all, 
user-supplied, and has a chance to handle out-of-memory conditions. 
Unfortunately 
VCDiffStreamingDecoderImpl::decoded_target_ is a simple std::string and it gets 
to 
buffer the entire output generated by an arbitrary number of delta windows. If 
memory 
is at a premium (I am dealing with Windows CE where primary process heap is 
easy to 
exhaust), the optimum approach would be to pass in the delta stream to 
DecodeChunk() 
window-by-window which the user has no way of doing using the public API.

Thus, while the API proposed above would resolve this issue, a simpler fix 
might 
suffice to help with out-of-memory condition due to large decoded_target_ size: 
why 
not [optionally] call TruncateToBeginningOfWindow() on each delta window and 
not at 
the end of the batch?

The workaround I am going to implement for the moment is to feed very small 
chunks to 
DecodeChunk() which has got to kill the decoder performance as a side effect :(

Original comment by max.moto...@gmail.com on 10 Dec 2009 at 2:31

GoogleCodeExporter commented 9 years ago

Original comment by jim.rosk...@gmail.com on 12 Dec 2009 at 8:09

GoogleCodeExporter commented 9 years ago

> a simpler fix might
> suffice to help with out-of-memory condition due to large decoded_target_ 
size:
> why not [optionally] call TruncateToBeginningOfWindow() on each delta window
> and not at the end of the batch?

Thank you for the suggestion!  I intend to incorporate it into the next release 
of 
open-vcdiff.

Original comment by openvcd...@gmail.com on 17 Dec 2009 at 7:20

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

The suggestion to truncate the target has been implemented, but the original 
"pull-model" suggestion is still not addressed.

Original comment by openvcd...@gmail.com on 30 Nov 2010 at 12:45

Changed state: New

shaikat3 / open-vcdiff

Add "pull-model" interface to decoder with a limited output buffer size #12