nck-2 / test-rep

0 stars 0 forks source link

Different notes from writting internal API doc #1281

Closed githubmanticore closed 1 year ago

githubmanticore commented 1 year ago

There are several small issues or errors, see no need to make separate issue on each of them, let them be in one place.

  1. [x] document not yet documented commands of API

  2. [x] may be upgrade update command to support 64 bits?

  3. [ ] for keywords make available sort-by docs/hits

  4. [ ] if persist is not available, may be let execute one(!) next command?

  5. [ ] for console status body is send as '1' and is marked as 'dummy body'. That is not true! That is global status request!

  6. [ ] m.b. add predicted_time and dist_predicted time to output of Meta (globally) instead of just using it in 'status' API call.

  7. [ ] for ISphOutputBuffer - consider to refactor SendLSBDword to platform format

  8. [ ] check whether we collect and return status for ALL commands.

  9. [x] SendAPIPercolateReply has something nasty in if (bQuery) branch.

  10. [ ] Teach CallPQ to get list of indexes instead the only index.

  11. [ ] Teach CallPQ to return warning usual way, not at the end!

  12. [ ] packed factors are NOT byte-order agnostic.

  13. [ ] m.b. refactor 'search' command; upgrade major version.

A bit of explanations

  1. It is revealed that some API commands are available, but not even mentioned in API doc.
    SetFilterStringList, SetQueryFlag, SetOuterSelect, SetTokenFilter

  2. Update (plain old, by doc id selector) supports numbers update, now even strings update. But no support for 64-bits. And feature looks really easy to add, why not?

  3. Keywords command internally support sort by docs and sort by hits (see GetKeywordsSettings_t), but they're note exposed to API.

  4. Logic of API implies running only one command per connection. If we issue 'persistent' command, and no persistence is possible (max_children exceeded), connection is dropped after the try. Let's for 'persistent' command especially give possibility to exec one actual command in the same connection.

  5. For searchd --status in code body commented as 'dummy body', however actually it is for a long time is not dummy.

  6. Both API 'status' and sphinxql 'show meta' use one and same command BuildMeta. But API, in addition, returns 'predicted_time' and 'dist_predicted_time'. Let's just move both to commont BuildMeta: code will be simpler; sphinxql show meta will be enriched.

  7. SendLSBDword is written with byte-by-byte processing. Since we're on Intel, let's make just direct write of values for LSB, as it is done in BSON manipulations.

  8. Show status displays command statistics. Just check that ALL commands are mentioned there (AFAIK there are no tests whether some of them dismissed or not)

  9. It is kind of copy-paste error; 3 times checked one and same condition.

  10. Search/Update may take list of indexes. CallPQ - only if they're encapsulated in distributed index. Let's teach the command to process list of indexes also.

  11. Usual way for our api is to return warnings with SEARCHD_WARNING status code, but CallPQ invokes totally another way, and it is not looking reasonable at all.

  12. Everything is returned in predefined byte order (big-endian in general, little-endian for parts of bson and mysql). But packed factors is the only blob which is returned over network 'as is' without any sights of how it is built. The only way to guess - m.b. look into very first handshake of connection; no other hints available. So, it is quite possible to connect from ARM host to Intel agent and take a nonsense in 'packed factor' field because of no convention.

  13. Search command itself look quite monstrous for now.

    • every(!) query, including the simplest 'match' assume filling whole message with many not necessary default details as 'grouping', 'sorting', empty 'filters', 'weights', etc. That is even for client connection; agent's one require more.
    • some things are already messed, as outer_orderby - which most likely first was added to master-agent extension, but later considered to be useful for clients also, and now exists in both places simultaneously. Same true for outer_limit, but it is even more confusion, since it always present in the message, and simultaneously MAY present in master-agent extension, depended from the version AND param has_outer in common section.

So, for search it would be good to:

  1. Distinguish 'Agent' mode from 'master-agent' extensions. Agent mode actually means ONLY that you need additional columns in resultset in order to calculate full query on the master. 'Master-agent' extensions, in turn, actually not strictly necessary only to master, but (as 'outer_orderby' messing shows) may be exposed to others also. Most likely example of another extension which also would be good not only for 'master-agent' is UDF ranking stuff: that is not a problem if client would provide necessary strings (udf plugin, function and parameters) and use the functional; it is not looking as very specific 'agent-master' one. So, 'agent-master' and 'agent mode' looks like independent features, this is no reason to union them.
  2. Find a way to send only non-default fields. Say, if we send select * from * where match('test') the only thing is matching 'test'. And in this case even sphinxql would win, since it will not pass all non-necessary stuff. I considered different protocols and would give try to google's protobuf. Apart the fact that it is very compressed and versioned, it is also supported out of the box on may languages and platforms. At least for 'search' query it looks not as a big deal to play with .proto file and experiment with generated parser.
githubmanticore commented 1 year ago

➤ Stan commented:

if persist is not available, may be let execute one(!) next command?

seems good to execute next command without error? maybe just warn about that

Update (plain old, by doc id selector) supports numbers update, now even strings update. But no support for 64-bits. And feature looks really easy to add, why not?

because all API needs that 64bit update functions and 64bit numbers is not standard there

githubmanticore commented 1 year ago

➤ Aleksey N. Vinogradov commented:

p. 1 is done in go-sdk

githubmanticore commented 1 year ago

➤ Aleksey N. Vinogradov commented:

p.9 was fixed in 4795dc49