utk-se / WorldSyntaxTree

Language-agnostic parsing of World of Code repositories
Other
20 stars 0 forks source link

Move db writer to a separate process with a single queue #32

Open robobenklein opened 3 years ago

robobenklein commented 3 years ago

All processes will append docs to insert to a queue which will be consumed by the writer process.

This helps prevent the write-write conflict / locking over the db updates which could result in dedup conflicts.

Would also simplify writing to file if all docs written pass through the queue.

Writer process will need to be extremely efficient though in order to handle thousands of docs/second.

robobenklein commented 3 years ago

Inserts that fail will be re-added to a retry queue along with a list of the errors, when that list for some doc becomes too large, stop the entire procedure since writes are failing.

robobenklein commented 3 years ago

This is a bad idea.

Don't use a single IPC queue for passing documents, performance is worse than my house connection to the DB...

Need some method to write documents from each worker in parallel, maybe FileLock is performant enough? Until then, manual checks on arango it is.