opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE] Add `TO_JSON_STRING` and `ARRAY_LENGHT` functions and refactor the `JSON` function #869

Closed LantaoJin closed 2 weeks ago

LantaoJin commented 2 weeks ago

Is your feature request related to a problem? A followup of https://github.com/opensearch-project/opensearch-spark/issues/667

  1. Refactor JSON: Currently the JSON evaluates STRING/JSON_ARRAY/JSON_OBJECT. It brings many confusions. In Splunk, the data could be stored as JSON object, JSON function evaluates JSON object. But in Spark, there is no JSON object, but it contains STRING, StructType and ArrayType. So the JSON function can only evaluate STRING types.

    1. Add ARRAY_LENGHT: for the same reason above, we separate JSON_ARRAY_LENGHT to JSON_ARRAY_LENGHT and ARRAY_LENGHT. JSON_ARRAY_LENGHT only accepts STRING type, ARRAY_LENGHT only accepts ArrayType.
  2. Add TO_JSON_STRING: After the refactor of (1), we still need a method to convert JSON_ARRAY/JSON_OBJECT to valid JSON STRING. TO_JSON_STRING accepts both StructType and ArrayType as input and returns JSON formatted string.