ukwa / webarchive-discovery

WARC and ARC indexing and discovery tools.
https://github.com/ukwa/webarchive-discovery/wiki
113 stars 24 forks source link

Flush buffered documents when writing documents to file #265

Closed tokee closed 2 years ago

tokee commented 2 years ago

When the documents generated by warc-indexer are written to a single file (for debugging or later batch-index into a Solr installation), a flush is missing, triggering an exception and potentially causing a truncated output.

This pull request adds a flush in the proper place and makes some of the JavaDocs for the DocumentConsumer more explicit on what will happen.