Open AbstractiveNord opened 1 year ago
Addendum: Probably ManticoreSearch may handle some kind of preprocessing of queries, like merging identical full-text queries into single one, re-ordering queries by full-text operators (less full-text filters may lead to use calculated data in others queries instead of re-calculating).
that could work, ie create some structure to speed up query processing like Luwak does however that structure needs rebuild in case user changes PQ often
It also worth to make Common subtree optimization work in the PQ
Would that work include Common Query Optimization technique?
no PQ execution of CALL PQ does not use any of the optimizations
I don't get, will common query optimization and common subtree optimization be implemented.
these optimizations do not work for PQ index and call pq statement but it could be easier to add these optimizations into pq index as these are already implemented for regular indexes and code should work in general than implement your feature
I just posted my suggestions of possible implementation of your request
@AbstractiveNord could you share your data that has multiple similar full-text query that could get benefit of the full-text optimization but now these perform slow?
@AbstractiveNord could you share your data that has multiple similar full-text query that could get benefit of the full-text optimization but now these perform slow?
In general, I have about hundreds of queries, full-text part of them is absolutely identical, different filters only. I will prepare some data and upload to your S3 bucket.
@AbstractiveNord
I will prepare some data and upload to your S3 bucket.
Pls ping us here when you are done with it.
@AbstractiveNord
I will prepare some data and upload to your S3 bucket.
Pls ping us here when you are done with it.
I've planned to prepare data on this Saturday.
@AbstractiveNord
I will prepare some data and upload to your S3 bucket.
Pls ping us here when you are done with it.
Data is uploaded. Please report any problems or missing data.
checking the cases you provided I see that query cache that could be added without lot of code change however it will not work for your case as it could speed up only full-text queries these full-text part matches between queries and filters are matched or a subset of the query that was already cached. That is not your case as you have different filters and some cases have different parts of the full-text queries.
and Common subtree optimization needs a large refactoring as it needs batching of the queries to capture common part and reusing the result of the sibling matching in the batch but now code process queries separately from the queue.
I'd estimate the change needs for Common subtree optimization
in 20 to 40 hours but it is still not clear is it still applicable in the general case. As there could be different types of queries in the single batch there Common subtree
could has no effect.
That should be also fixed by sorting the queries on inserting the new query to keep similar queries together. However keep the queries list sorted for every insert and delete operation could also slow down data population into PQ index.
checking the cases you provided I see that query cache that could be added without lot of code change however it will not work for your case as it could speed up only full-text queries these full-text part matches between queries and filters are matched or a subset of the query that was already cached. That is not your case as you have different filters and some cases have different parts of the full-text queries.
and Common subtree optimization needs a large refactoring as it needs batching of the queries to capture common part and reusing the result of the sibling matching in the batch but now code process queries separately from the queue.
I'd estimate the change needs for
Common subtree optimization
in 20 to 40 hours but it is still not clear is it still applicable in the general case. As there could be different types of queries in the single batch thereCommon subtree
could has no effect.That should be also fixed by sorting the queries on inserting the new query to keep similar queries together. However keep the queries list sorted for every insert and delete operation could also slow down data population into PQ index.
It's also nice to speedup full-text queries without attribute filtering, tho. Meanwhile, low insertion speed if PQ rules is not a problem at all. Thanks for detailed answer.
another approach is use query cache but to run in first time without any filters to collect only matched doclist then use this cached query for all stored PQ with different filter settings. However that also adds dry run for most of queries and needs analyzer to make sure not to run full-text passes for the queries these are not have common parts with all other queries stored.
The analyzer could
Is your feature request related to a problem? Please describe. In case of percolate searching, common pattern is huge amount of identical full-text queries or highly similar full-text queries with differences in attribute queries only. Currently, ManticoreSearch will not re-use already calculated full-text resultset, which causes ineffective usage, harms usability, especially if your system uses routing based on doc IDs or tags.
Describe the solution you'd like Implement common resultset cache for re-use data in other percolate queries, if percolate rule have identical or subset full-text query.
Describe alternatives you've considered Ineffective use of compute resources for highlighted cases.
Additional context Cases:
Related discuss