Only discard document buffer exceeding length when entire buffer is scanned

rickardp / splitstream

Continuous object splitter for C and Python

Apache License 2.0

44 stars 9 forks source link

@rickardp

Fixes a bug where the scan/in-progress-document buffer could be discarded if the buffer ever exceeded the max document size. If a document was found by scan() (i.e., end > 0), then we should not discard the in-progress document (s->doc), because whatever remained in s->doc was not scanned, so we don't know if it contains a document (of valid length). This can happen if bufsize is much larger than maxdocsize, because a huge amount of data will be read out of the file (causing the amount of data left in the buffer after a scan to exceed the max document size).

rickardp / splitstream

Only discard document buffer exceeding length when entire buffer is scanned #6