opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
120 stars 139 forks source link

[FEATURE] Add Support for IP Data Type #3145

Open currantw opened 2 weeks ago

currantw commented 2 weeks ago

Is your feature request related to a problem?

OpenSearch SQL plugin does not support the IP address field type.

🚫 IP address fields are converted to strings:

search=weblog | fields host

returns host with field type string.

🚫 IP address fields cannot be correctly for equality:

search=weblog | where host = "2001:0db7::ff00:42:8329" | fields host

will not return the value 2001:0db7:0000:0000:0000:ff00:0042:8329, even though both strings represent the same IP address.

What solution would you like?

Outcomes:

Proposed Solution:

What alternatives have you considered?

None

Do you have any additional context?

This is closely related to https://github.com/opensearch-project/sql/pull/3110. This issue added a new cidrmatch(ip, cidr) function that returns whether the given IP address is within the specified CIDR IP address range. As part of this work, the SQL plugin was updated to cast IP addresses to strings - previously, it would raise an exception.

kedbirhan commented 2 days ago

we really need this feature, we can't even do basic IP lookups right now!

query

SELECT * 
FROM logs-vpc
WHERE dstaddr = "10.100.138.82"
LIMIT 10;

log

400 Bad Request: "{<EOL>  "error": {<EOL>    "reason": "Invalid SQL query",<EOL>    "details": "= function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[STRING,STRING],[BOOLEAN,BOOLEAN],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[TIMESTAMP,TIMESTAMP],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [IP,STRING]",<EOL>    "type": "ExpressionEvaluationException"<EOL>  },<EOL>  "status": 400<EOL>}"
YANG-DB commented 22 hours ago

Is your feature request related to a problem?

OpenSearch SQL plugin does not support the IP address field type.

🚫 IP address fields are converted to strings:

search=weblog | fields host

returns host with field type string.

🚫 IP address fields cannot be correctly for equality:

search=weblog | where host = "2001:0db7::ff00:42:8329" | fields host

will not return the value 2001:0db7:0000:0000:0000:ff00:0042:8329, even though both strings represent the same IP address.

What solution would you like?

Outcomes:

  • IP addresses can be retrieved using the OpenSearch SQL plugin without conversion to strings.
  • IP addresses supports equality operations (= and !=).
  • IP addresses supports comparison operations (<, <=, >, and >) if they are both IPv4 or IPv6.
  • IP addresses supports sorting (again, if they are all IPv4 or IPv6).
  • IP addresses work with IP-specific functions (currently only cidrmatch - see Add CIDR function to PPL (#3036) #3110).

Proposed Solution:

  • Add an new IP type to ExprCoreType.
  • Replace OpenSearchExprIpValue with ExprIpValue , and update implementation.
  • Update OpenSearchDataType.MappingType to map "ip" fields to ExprCoreType.IP.
  • Update OpenSearchExprValueFactory.
  • Update other code, unit tests, and integration tests as necessary.

What alternatives have you considered?

None

Do you have any additional context?

This is closely related to #3110. This issue added a new cidrmatch(ip, cidr) function that returns whether the given IP address is within the specified CIDR IP address range. As part of this work, the SQL plugin was updated to cast IP addresses to strings - previously, it would raise an exception.

Hi @currantw - this is a great idea my only concern is how to differentiate between a keyword field and an ip field - it can infer this implicitly but its not always the case that a user knows this in advanced ...

Can we explicitly use an ip syntax to have both the query writer and the query parser engine be aware this field is expected to be an ip and not text/keyword ?

This should be applicable for any <, <=, >, and > predicate operators

search=weblog | where host = ip("2001:0db7::ff00:42:8329") | fields host