Closed acnebs closed 1 year ago
Hi @acnebs ,
in general, consider that addAllAsync
is generally slower than addAll
(which is just a convenience method over add
), and it is only really needed when indexing documents can block the main thread for too long.
That said, it might make sense to add methods addOrReplaceAll
and addOrReplaceAllAsync
to MiniSearch
for this (I'll think about it). In the meanwhile, you can implement something like that yourself as:
const addOrReplaceAll = (documents) => {
for (const document of documents) {
if (ms.has(document.id)) { ms.discard(document.id) }
this.add(document)
}
}
const addOrReplaceAllAsync = (documents, options) => {
const chunkSize = (options || {}).chunkSize || 10
const acc = { chunk: [], promise: Promise.resolve() }
const { chunk, promise } = documents.reduce(({ chunk, promise }, document, i) => {
chunk.push(document)
if ((i + 1) % chunkSize === 0) {
return {
chunk: [],
promise: promise
.then(() => new Promise(resolve => setTimeout(resolve, 0)))
.then(() => {
addOrReplaceAll(chunk)
})
}
} else {
return { chunk, promise }
}
}, acc)
return promise.then(() => this.addOrReplaceAll(chunk))
}
I did not test this code, so please forgive possible bugs.
Thanks for the response! I guess I should've dug into the actual source code a bit more, for some reason I assumed that addAll
was not just a convenience wrapper for add
– thought there might be some performance benefit to using it for many documents so that the algo complexity is not just O(n).
It seems like there is still the possibility of a race condition with the current system, as the has
+ discard
check and the actual adding of the doc is not overall an atomic operation, but in practice that probably is a non-issue.
Hi @acnebs ,
yeah, technically has
+ discard
is not atomic, but on usual single-threaded JavaScript runtimes this won't matter, because nothing can occur between them. If it did matter, all sort of operations would be unsafe in JavaScript 🙂
As for addAll
, yeah there is really no optimization over looping through add
(I don't think that any meaningful optimization is possible there). Whereas removeAll
does optimize for the case when it's called without arguments (but that is not relevant here).
I will close the issue for now, as the question should be answered, but feel free to comment further, and I will reopen it if necessary.
Hi, was just wondering if there was a recommended way to handle "race conditions" when using the
addAllAsync()
method (or I suppose when usingaddAll
but in separate async defined functions). Rather than try to explain too much, here is a repro of what I'm talking about.As you can see, I try to make a method called
addSafely
which iterates through the docs I'm trying to add and removes them first, but of course this doesn't work ifaddAllAsync
is being used, or ifaddAll
are coming in from different parts of the event loop. Is there a way to solve this in userland? Seems like something that realistically needs to be handled at the library level.Thanks!