konklone / oversight.garden

Bringing together the oversight community's work.
https://oversight.garden
Creative Commons Zero v1.0 Universal
26 stars 9 forks source link

Indexing: fork worker processes #115

Closed divergentdave closed 8 years ago

divergentdave commented 8 years ago

This fixes #101. Loading text and indexing reports is split off into a worker process, which is managed using the worker-farm module. If the worker process crashes during GC, a new process will be started, and the job will be retried. Also, the worker process is stopped and started every 1000 documents, to limit overall memory usage.

Since the worker processes don't inherit command line arguments, I made some additional changes to avoid loading the config file from the worker processes, and I instead pass the config object from the parent process to the worker process.

konklone commented 8 years ago

This is super solid, and for the kind of refactor it is, not much of a major code change. Nice work!

divergentdave commented 8 years ago

Thanks! I had to do it the wrong way first, but I'm happy with how it turned out.