opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.74k stars 1.81k forks source link

[Feature Request] relax max Clauses Count limitation of termS query over IP field #16200

Open mkhludnev opened 1 month ago

mkhludnev commented 1 month ago

Is your feature request related to a problem? Please describe

Querying Ip field with terms query can hit max Clauses Count limit.
https://forum.opensearch.org/t/terms-search-gives-error-failed-to-create-query-maxclausecount-is-set-to-1024/21729/8

Describe the solution you'd like

Plain ip addresses might be handled by rewiriting into bitset efficiently. But ip masks with slashes causes a problem since they can only be handled with boolean query (and combining disjunction over many field types is really complex).

I propose to split ip terms onto two lists with masks and concrete ips, and handle them separately. Thus terms query will only limit number of masks values by max Clause count, although we can nest bool over masks deeply to overcome it.

Related component

Search:Query Capabilities

Describe alternatives you've considered

No response

Additional context

No response

sandeshkr419 commented 3 weeks ago

[Search Triage] Yes, we should review max clause count limits, and for not just IP fields.

@mkhludnev Do you have some recommendations on it further as well?

mkhludnev commented 3 weeks ago

I have a two stage plan:

  1. 16200 handle concrete IPs and masks with slashes separately. This shifts limits to number of masks only allowing many concrete IPs in values. I suppose it covers %90 of SIEM needs.

  2. implement 1D version of XYShapeQuery querying as many ranges as requested. Ha it's a kind of MultiRangeQuery from lucene.sandbox. UPD #16391 - it can handle unlimited number of ranges but not for DV-only field. Not sure what to do with it.