Fixes a bug where the scan/in-progress-document buffer could be discarded if the buffer ever exceeded the max document size. If a document was found by scan() (i.e., end > 0), then we should not discard the in-progress document (s->doc), because whatever remained in s->doc was not scanned, so we don't know if it contains a document (of valid length). This can happen if bufsize is much larger than maxdocsize, because a huge amount of data will be read out of the file (causing the amount of data left in the buffer after a scan to exceed the max document size).
Thanks for the contribution!
The idea from the beginning was that documents larger than max should always be discarded, but this did not hold before either, so I believe your change makes things better!
@rickardp
Fixes a bug where the scan/in-progress-document buffer could be discarded if the buffer ever exceeded the max document size. If a document was found by
scan()
(i.e.,end > 0
), then we should not discard the in-progress document (s->doc
), because whatever remained ins->doc
was not scanned, so we don't know if it contains a document (of valid length). This can happen ifbufsize
is much larger thanmaxdocsize
, because a huge amount of data will be read out of the file (causing the amount of data left in the buffer after a scan to exceed the max document size).