Closed traverseda closed 3 years ago
I have a branch pwriter
where I've started to implement this. Unfortunately it requires more knowledge of the whoosh backend than I actually have. I'm basing it on the multiproc writer, but I'm having a hard time figuring out how things are actually structured.
No longer relevant since move to sqlite
Right now LCARS is limited to a single search indexer. Here are the problems with the existing solutions.
It's a bit of a tricky thing to solve in a distributed system. Of course if you're just indexing html the only thing we're getting out of being a distributed system is a way of doing RPC calls, and scheduling calls for the future. The overhead from the sqlite task-queue will quickly erode any performance benefits from using a task-queue.
I like using a task-queue, it makes things very simple as we don't need real RPC mechanisms and we can scale easily, but the overhead from the sqlite is ~40% (according to py-spy) for lightweight html documents, the kind which will probably make up the majority of the content most people index.