All of the data required to run a single split leaf search needs to sit in RAM for a few hundred ms.
Right now we can limit Searcher RAM usage by limiting search concurrency.
However, the amount of RAM necessary to run a split search is difficult to foresee and it can vary a lot.
Also limiting search concurrency can strongly impact search latency.
We also expose ourselves to "Query of death" phenomenons.
Capping memory is non-trivial... If we just used a semaphore for instance, we would expose ourselves to a possible deadlock.
We need a best-effort solution to cap the search RAM usage.
have a global semaphore measure the searcher "working memory" RAM.
reserve 100mb to start search on a split.
tap on that reserved 100mb before downloading any extra data.
if 100mb is exceeded for a given split, try to reserve more memory from the global semaphore with a timeout of 1s. If the timeout hits, fails the search request.
release the reserved memory as soon as warmup is finished.
With this solution, as long as a split requires less than 100mb, no problem happens.
Only pathological search request (taking more than 100mb per split) may fail.
All of the data required to run a single split leaf search needs to sit in RAM for a few hundred ms. Right now we can limit Searcher RAM usage by limiting search concurrency.
However, the amount of RAM necessary to run a split search is difficult to foresee and it can vary a lot. Also limiting search concurrency can strongly impact search latency.
We also expose ourselves to "Query of death" phenomenons.
Capping memory is non-trivial... If we just used a semaphore for instance, we would expose ourselves to a possible deadlock.
We need a best-effort solution to cap the search RAM usage.