Open A-Gray-Cat opened 3 weeks ago
Adding additional context here:
The following query needs to be supported:
source = http_logs | eval c1 = unix_timestamp(@timestamp) | eval c2 = now() | eval c3 =
DAY_OF_WEEK(@timestamp) | eval c4 =
DAY_OF_MONTH(@timestamp) | eval c5 =
DAY_OF_YEAR(@timestamp) | eval c6 =
WEEK_OF_YEAR(@timestamp) | eval c7 =
WEEK(@timestamp) | eval c8 =
MONTH_OF_YEAR(@timestamp) | eval c9 =
HOUR_OF_DAY(@timestamp) | eval c10 =
MINUTE_OF_HOUR(@timestamp) | eval c11 =
SECOND_OF_MINUTE(@timestamp) | eval c12 =
LOCALTIME() | fields c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12 | head 10
@LantaoJin can you please verify and add tests for these date conversion ?
and please don't forget NOW()
and DATE_SUB
😄
On the main branch I can successfully execute:
scala> val dfp = spark.sql("source = dev.default.iot | eval @timestamp = now() | eval c1 = unix_timestamp(@timestamp) | eval c2 = now() | eval c3 = DAY_OF_WEEK(@timestamp) | eval c4 = DAY_OF_MONTH(@timestamp) | eval c5 = DAY_OF_YEAR(@timestamp) | eval c6 = WEEK_OF_YEAR(@timestamp) | eval c7 = WEEK(@timestamp) | eval c8 = MONTH_OF_YEAR(@timestamp) | eval c9 = HOUR_OF_DAY(@timestamp) | eval c10 = MINUTE_OF_HOUR(@timestamp) | eval c11 = SECOND_OF_MINUTE(@timestamp) | eval c12 =LOCALTIME() | fields c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12 | head 10"); dfp.show()
+----------+--------------------+---+---+---+---+---+---+---+---+---+--------------------+
| c1| c2| c3| c4| c5| c6| c7| c8| c9|c10|c11| c12|
+----------+--------------------+---+---+---+---+---+---+---+---+---+--------------------+
|1728485509|2024-10-09 16:51:...| 4| 9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...| 4| 9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...| 4| 9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...| 4| 9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
|1728485509|2024-10-09 16:51:...| 4| 9|283| 41| 41| 10| 16| 51| 49|2024-10-09 16:51:...|
+----------+--------------------+---+---+---+---+---+---+---+---+---+--------------------+
So it seems all function mentioned in https://github.com/opensearch-project/opensearch-spark/issues/722#issuecomment-2394358931 and now()
are working already.
DATE_SUB is also working, the syntax is date_sub(start_date, num_days) - Returns the date that is num_days before start_date.
https://spark.apache.org/docs/3.5.1/api/sql/index.html#date_sub
scala> val dfp = spark.sql("source=dev.default.iot | eval a = DATE_SUB(DATE('2008-01-02'), 31)");dfp.show()
+-----------+---------+--------------------+----------+
|temperature|device-id| timestamp| a|
+-----------+---------+--------------------+----------+
| 12| 1492|2023-04-06 17:07:...|2007-12-02|
| 12| 1492|2023-04-06 17:07:...|2007-12-02|
| 13| 256|2023-04-06 17:07:...|2007-12-02|
| 14| 257|2023-04-06 17:07:...|2007-12-02|
| 15| 258|2023-04-06 17:07:...|2007-12-02|
+-----------+---------+--------------------+----------+
Hello @salyh, thanks for the update. I suspect it's the version I'm testing with doesn't have these functions available. In addition, could you check if the time()
function is also available? To run queries that fall within a specific time, it would require using the time()
function. E.g. ... | where time_dt > TIME('2024-10-09')
.
Thanks.
Hello @salyh, thanks for the update. I suspect it's the version I'm testing with doesn't have these functions available. In addition, could you check if the
time()
function is also available? To run queries that fall within a specific time, it would require using thetime()
function. E.g.... | where time_dt > TIME('2024-10-09')
.Thanks.
time()
is not supported but timestamp()
scala> val dfp = spark.sql("source=dev.default.iot | eval time = timestamp('2015-03-05T09:32:05.359')");dfp.show()
And is this README up to date? https://github.com/opensearch-project/opensearch-spark/blob/fd3f82fc16273d47f17aa47e699aeaa8b4c3679d/ppl-spark-integration/README.md
I don't think so. This https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/README.md should be uptodate (right @YANG-DB ?)
That's good to know. Thanks! I also noticed the syntax of the date_sub function is different from how it's documented here: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/functions/datetime.rst#date-sub
Can we make sure that the syntax for the same function will be the same across non-spark based and spark based PPL?
will start development on this ticket
Hi @YANG-DB, just wanted to clarify the intention of this ticket. Is the intention of this ticket to:
Thanks!
From investigation the following functions are currently unsupported in spark:
Other observations
Is your feature request related to a problem?
Both would return the following error:
What solution would you like? Bring both functions to Spark based PPL driver:
What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.
Do you have any additional context? Add any other context or screenshots about the feature request here.