nsoft / jesterj

Document Ingestion Framework for Search Systems
Apache License 2.0
34 stars 33 forks source link

Documents Not Reprocessed on Restart #153

Closed nsoft closed 4 years ago

nsoft commented 4 years ago

Due to a "Cut"/"Pate"/"Did not modify" error the statements for finding processing and errored statuses on startup were overwritten by the statement for finding batched documents. This means that a document that errored out would never be retried, and a document in flight at shut down would not be processed on next startup. Upon fixing this it will also be important to add an error counter to ensure that documents that repeatably error out do not grow without bound creating ever increasing startup penalties. Docs with 3 previous errors should be transitioned to DEAD instead.

nsoft commented 4 years ago

Hmm, looking deeper there's additional problems, and even batched were not reprocessed. Adding a default scan operation with this and then will correct individual scanners in other tickets.