opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
110 stars 129 forks source link

[FEATURE] JSON Search Support #2652

Open brijos opened 2 months ago

brijos commented 2 months ago

Is your feature request related to a problem? Community members have asked for easier JSON parsing and analysis capabilities which allow them to not only search JSON logs and extract fields without writing complex parse expressions, but perform computations on JSON array values, such as finding the sum of all values in the array, where the number of elements in the array is not known.

What solution would you like? Allow users to extract and transform data from JSON-formatted events and fields. Users should be able to extract all values in an array by specifying a wildcard for the individual element position and doing an aggregation operation on them. Users should be able to extract: 1/single or multiple top level fields 2/nested fields 3/keys in arrays and perform operations on the values.

** Examples ***

What alternatives have you considered? No other solutions are available in PPL

Do you have any additional context? No

kedbirhan commented 2 months ago
SELECT
  json_extract(myblob, '$.name') AS name,
  json_extract(myblob, '$.projects') AS projects
FROM dataset

it would help a lot to support json_extract function to manipulate json string field as shown above.

dblock commented 3 weeks ago

Catch All Triage - 1 2 3 4 5 6

salyh commented 2 weeks ago

@anasalkouz @YANG-DB @brijos

Json Functions Proposal

I have created a working prototype for the json_extract function.

This is currently implemented in the sql sub-project to make the json functions available not only as PPL command. In other words: The function can be used (like any other built in function) in sql and ppl.

The proposed (and so far implemented syntax) is:

json_extract(<json>,<path>)

<json> Json as string. From an table cell or as literal (mandatory)

<path> a json path or a json pointer expression (mandatory).

The function returns the result as string (scalar value or full json)

An error is thrown when:

No error is thrown when:

Examples:

#JSON Path
select json_extract('{\"name\":\"saly\"}', '$.name')
#JSON Pointer
select json_extract('{\"name\":\"saly\"}', '/name')

Open questions:

  1. Should it be implemented as a built in sql function?
  2. Is the string to string approach sufficient (json input as string, function output as string)?

Not yet covered

salyh commented 1 week ago

@anasalkouz @YANG-DB @rupal-bq any comments on the proposal so far?

nateynateynate commented 6 days ago

Can we perhaps specify a document ID to use as the json for the query? I've got a lot of json blobs that I'd love to search through instead of breaking them up before ingest.

salyh commented 1 day ago

Can we perhaps specify a document ID to use as the json for the query? I've got a lot of json blobs that I'd love to search through instead of breaking them up before ingest.

can you post an example how this can look like?