nck-2 / test-rep

0 stars 0 forks source link

Harmonize pq and usual index ifaces #1364

Open githubmanticore opened 1 year ago

githubmanticore commented 1 year ago

For pq indexes now we have only 2 essential differences in working with:

  1. PQ may be used for it's direct purpose by invoking CALL PQ statement, or /json/pq/pq_index/_search endpoint.
  2. PQ has 2 schemas: one lists stored queries, second describes expected schema for docs (for usual indexes it is one and same schema). First one revealed by desc pq, second by desc pq table statements.

All the rest manipulations with pq is actually not different from ones with usual indexes, like listing stored queries, inserting new, deleting by filter.\ But internally these actions are processed now by kind of prototype code which limited to local pq indexes and can't do some enhanced things like filtering queries, say by RE2 expression.

Existing approaches

As mentioned, some of the commands very pq-specific and may work ONLY for local pq, where executor can check the type of the index immediately from config. This is ok for 'prototype' code where we don't want to make many checks and need only to make it working. But it seems kind of over-engineering for production. This is absolutely no reason to have different statement if you, say, want to delete row from an index. I.e. this is no difference nor by syntax, nor by expected behaviour if you 'delete from idx where id=1' if idx is rt, or if idx is pq.

Also, since for distributed pq we can't know the type of the index of remote side, these commands are just don't works with them, and it seems it is no reason to support them in spite of the fact that usual plain calls works at the same time. That makes things worse, since you don't just have another commands for pq, but also different commands for 'local pq' and 'distributed pq'. That is not good.

So, suggestions of the issue:

More about listing

As mentioned, when store query via HTTP it is possible to do it in 2 ways: either store in 'json' form as used for /json/search, either in 'ql' which is managed by syntax of sphinxql's where match('...') clause. The problem is that these two forms are kept different as a boolean flag attached to each stored query. And they may be distinguished ONLY via http iface, and ONLY with it's specific branch (so, no such info on listing distributed pqs). They are rendered then either as "ql":"clause", either as whole separate json sub-object.\ When inserting/listing by sphinxql, this flag is explicitly set to 'ql' state, and when listing it is just ignored, so that in listing if you see, say, {"title":"test"} in query column, it is just impossible to say whether it will run as ...where match ('{"title":"test"}'), or as ... where match ('@title test'), since info about query format is not available.

Here is suggestion is to make this flag also available via sphinxql. Either as boolean column, say query_is_json, or as enumerated column, say, query_type with values json or ql. Or any other suggestions welcome.

More about documentation

It is quite confusing that many things about pq which are common are described in special section of the doc and that section often has no linkage wit similar sections about plain indexes.

githubmanticore commented 1 year ago

➤ Sergey Nikolaev commented:

We've discussed different options in Slack and the most optimal one seems to be to encode query type directly to the query and not add the new column.

githubmanticore commented 1 year ago

➤ Gloria Vinogradova commented:

Still useful