opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
20 stars 30 forks source link

[FEATURE] Several date and time functions are missing from Spark based PPL driver #722

Open A-Gray-Cat opened 3 weeks ago

A-Gray-Cat commented 3 weeks ago

Is your feature request related to a problem?

  1. DATE_SUB() function is missing from Spark based PPL driver.
  2. NOW() function is missing.

Both would return the following error:

{"Message":"Fail to run query. Cause: Not Supported operation : Function"}

What solution would you like? Bring both functions to Spark based PPL driver:

What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context? Add any other context or screenshots about the feature request here.

YANG-DB commented 3 weeks ago

Adding additional context here:

The following query needs to be supported:

source = http_logs | eval c1 = unix_timestamp(@timestamp) | eval c2 = now() | eval c3 = 
  DAY_OF_WEEK(@timestamp) | eval c4 = 
  DAY_OF_MONTH(@timestamp) | eval c5 = 
  DAY_OF_YEAR(@timestamp) | eval c6 = 
  WEEK_OF_YEAR(@timestamp) | eval c7 = 
  WEEK(@timestamp) | eval c8 = 
  MONTH_OF_YEAR(@timestamp) | eval c9 = 
  HOUR_OF_DAY(@timestamp) | eval c10 = 
  MINUTE_OF_HOUR(@timestamp) | eval c11 = 
  SECOND_OF_MINUTE(@timestamp) | eval c12 =
  LOCALTIME() | fields c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12 | head 10

Here are the specifications

@LantaoJin can you please verify and add tests for these date conversion ?

A-Gray-Cat commented 3 weeks ago

and please don't forget NOW() and DATE_SUB 😄

salyh commented 2 weeks ago

On the main branch I can successfully execute:

scala> val dfp = spark.sql("source = dev.default.iot | eval @timestamp = now() | eval c1 = unix_timestamp(@timestamp) | eval c2 = now() | eval c3 = DAY_OF_WEEK(@timestamp) | eval c4 = DAY_OF_MONTH(@timestamp) | eval c5 = DAY_OF_YEAR(@timestamp) | eval c6 = WEEK_OF_YEAR(@timestamp) | eval c7 = WEEK(@timestamp) | eval c8 = MONTH_OF_YEAR(@timestamp) | eval c9 = HOUR_OF_DAY(@timestamp) | eval c10 = MINUTE_OF_HOUR(@timestamp) | eval c11 = SECOND_OF_MINUTE(@timestamp) | eval c12 =LOCALTIME() | fields c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12 | head 10"); dfp.show()

+----------+--------------------+---+---+---+---+---+---+---+---+---+--------------------+
|        c1|                  c2| c3| c4| c5| c6| c7| c8| c9|c10|c11|                 c12|
+----------+--------------------+---+---+---+---+---+---+---+---+---+--------------------+
|1728485509|2024-10-09 16:51:...|  4|  9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...|  4|  9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...|  4|  9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...|  4|  9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...|  4|  9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
+----------+--------------------+---+---+---+---+---+---+---+---+---+--------------------+

So it seems all function mentioned in https://github.com/opensearch-project/opensearch-spark/issues/722#issuecomment-2394358931 and now() are working already.

salyh commented 2 weeks ago

DATE_SUB is also working, the syntax is date_sub(start_date, num_days) - Returns the date that is num_days before start_date. https://spark.apache.org/docs/3.5.1/api/sql/index.html#date_sub


scala>  val dfp = spark.sql("source=dev.default.iot | eval a = DATE_SUB(DATE('2008-01-02'), 31)");dfp.show()
+-----------+---------+--------------------+----------+
|temperature|device-id|           timestamp|         a|
+-----------+---------+--------------------+----------+
|         12|     1492|2023-04-06 17:07:...|2007-12-02|
|         12|     1492|2023-04-06 17:07:...|2007-12-02|
|         13|      256|2023-04-06 17:07:...|2007-12-02|
|         14|      257|2023-04-06 17:07:...|2007-12-02|
|         15|      258|2023-04-06 17:07:...|2007-12-02|
+-----------+---------+--------------------+----------+
A-Gray-Cat commented 2 weeks ago

Hello @salyh, thanks for the update. I suspect it's the version I'm testing with doesn't have these functions available. In addition, could you check if the time() function is also available? To run queries that fall within a specific time, it would require using the time() function. E.g. ... | where time_dt > TIME('2024-10-09').

Thanks.

A-Gray-Cat commented 2 weeks ago

And is this README up to date? https://github.com/opensearch-project/opensearch-spark/blob/fd3f82fc16273d47f17aa47e699aeaa8b4c3679d/ppl-spark-integration/README.md

salyh commented 2 weeks ago

Hello @salyh, thanks for the update. I suspect it's the version I'm testing with doesn't have these functions available. In addition, could you check if the time() function is also available? To run queries that fall within a specific time, it would require using the time() function. E.g. ... | where time_dt > TIME('2024-10-09').

Thanks.

time() is not supported but timestamp()

scala> val dfp = spark.sql("source=dev.default.iot | eval time = timestamp('2015-03-05T09:32:05.359')");dfp.show()

salyh commented 2 weeks ago

And is this README up to date? https://github.com/opensearch-project/opensearch-spark/blob/fd3f82fc16273d47f17aa47e699aeaa8b4c3679d/ppl-spark-integration/README.md

I don't think so. This https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/README.md should be uptodate (right @YANG-DB ?)

A-Gray-Cat commented 2 weeks ago

That's good to know. Thanks! I also noticed the syntax of the date_sub function is different from how it's documented here: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/functions/datetime.rst#date-sub

A-Gray-Cat commented 2 weeks ago

Can we make sure that the syntax for the same function will be the same across non-spark based and spark based PPL?

kenrickyap commented 2 weeks ago

will start development on this ticket

kenrickyap commented 2 weeks ago

Hi @YANG-DB, just wanted to clarify the intention of this ticket. Is the intention of this ticket to:

  1. Ensure that all datetime PPL functions implemented in opensearch-sql is also implemented in opensearch-spark
  2. Ensure that all implemented datetime function in opensearch-spark have consistent syntax to the opensearch-sql counterpart.

Thanks!

kenrickyap commented 1 week ago

From investigation the following functions are currently unsupported in spark:

Other observations