manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.1k stars 510 forks source link

Need to bench first. Revamp of github #793 (https://github.com/manticoresoftware/dev/issues/793). Related specifically to PQ tables. #1921

Open klirichek opened 9 months ago

klirichek commented 9 months ago

Original issue supplemented with text 'it might be better to...'

Finaly, it finished with couple of impl details:

  1. If ft part of query is empty - assume it is full-scan
  2. That path is totally ignored in case of http queries.

So, benching is necessary for the case:

a) bunch of full-scan percolates. Quite big, to assume different codepath (comparing to ft queries). b) bunch of documents. Including field/attributes, indicating, that both pure full-scan approach to them work significantly faster, then FT + filtering. M.b. source of #1794, transformed into 'call pq' call would be enough.

c) create couple of PQ tables. Each of them sourced with pqs mentioned in a), but one is sphinxql-flavour, second is json-flavour. d). Run b) documents with 'call pq' over both tables. Note the difference.

If there is difference (expected - into flavour of sql-flavoured pqs) - hypoteses is right, we need to implement same fix as for gitlab #308 for json queries.

If difference is statistically non-significant, or negative - it reveal, that original fix with predicate 'it might be better to...' is wrong, and it is better to roll-back original 'fixup' (as it actually fixup nothing, but provides more complex codeflow without real reason)

sanikolaev commented 8 months ago

Related issue is https://github.com/manticoresoftware/dev/issues/793.

sanikolaev commented 2 months ago

As discussed with @klirichek, it's hard to estimate how long this task will take.