timeplus-io / proton

A streaming SQL engine, a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse.
https://timeplus.com
Apache License 2.0
1.49k stars 62 forks source link

Question regarding lag #806

Open KVeschgini opened 1 month ago

KVeschgini commented 1 month ago

Can someone please help me understand the result of the following query:

SELECT
  *
FROM
  (
    SELECT
      signal_name, value_float64, lag(value_float64), value_float64 - lag(value_float64)
    FROM
      telemetry
    PARTITION BY
      signal_name
  )
WHERE
  signal_name = 'Speed'
LIMIT 5

Query id: 881b81a5-7fa0-4240-a424-31bb765e64ec

┌─signal_name─┬─value_float64─┬─lag(value_float64)─┬─minus(value_float64, lag(value_float64))─┐
│ Speed       │     48.506393 │                  0 │                            -68499.438607 │
└─────────────┴───────────────┴────────────────────┴──────────────────────────────────────────┘
┌─signal_name─┬─value_float64─┬─lag(value_float64)─┬─minus(value_float64, lag(value_float64))─┐
│ Speed       │      48.48633 │          48.506393 │                              48.08114934 │
└─────────────┴───────────────┴────────────────────┴──────────────────────────────────────────┘
┌─signal_name─┬─value_float64─┬─lag(value_float64)─┬─minus(value_float64, lag(value_float64))─┐
│ Speed       │     48.467033 │           48.48633 │                       -8.410789999999999 │
└─────────────┴───────────────┴────────────────────┴──────────────────────────────────────────┘
┌─signal_name─┬─value_float64─┬─lag(value_float64)─┬─minus(value_float64, lag(value_float64))─┐
│ Speed       │     48.448513 │          48.467033 │                               48.0331646 │
└─────────────┴───────────────┴────────────────────┴──────────────────────────────────────────┘
┌─signal_name─┬─value_float64─┬─lag(value_float64)─┬─minus(value_float64, lag(value_float64))─┐
│ Speed       │     48.430805 │          48.448513 │                             48.413310828 │
└─────────────┴───────────────┴────────────────────┴──────────────────────────────────────────┘

lag(value_float64) is as expected the same as the value_float64 from the last row but the difference value_float64 - lag(value_float64) seems to be arbitrary. What am I doing wrong here?

jovezhong commented 1 month ago

Hi @KVeschgini , I am a bit surprised to see this. I don't have your data set, but on one of our demo server, I can get the correct diff

select cid, speed_kmh,lag(speed_kmh),speed_kmh-lag(speed_kmh) from car_live_data where cid like 'c0000%' partition by cid

the url-addressable link (you only need to use Google/Micosoft ID to SSO): https://demo.timeplus.cloud/default/console/query?sql=select+cid,+speed_kmh,lag(speed_kmh),speed_kmh-lag(speed_kmh)+from+car_live_data+where+cid+like+'c0000%25'+partition+by+cid image

Timeplus leverages the similar query engine as ClickHouse, which provides some shortcut to compute such a-b, when b is defined in the same query. In many other database, you have to create a CTE/subquery. So you may try this "safe" approach

SELECT
  signal_name, value_float64,  theLag, value_float64 - theLag
FROM
  (
    SELECT
      signal_name, value_float64, lag(value_float64) as theLag
    FROM
      telemetry
    PARTITION BY
      signal_name
  )
WHERE
  signal_name = 'Speed'
LIMIT 5

Also, maybe you can create such question in the Discussions tab, instead of an Issue. No big deal, just try to consolidate the questions from the community. Thanks for trying Timeplus and sharing your feedback.

chenziliang commented 1 month ago

@KVeschgini May i ask which version of proton you are using ? Thanks. SELECT version()

yl-lisen commented 1 month ago

@KVeschgini So far, Proton cannot correctly interpret/handle this expression value_float64 - lag(value_float64) (stateless function + stateful function) in the current PARTITION BY implementation. This is a problem we need to solve.

Before we solve it, you may try "safe" approach mentioned by @jovezhong .

SELECT
  signal_name, value_float64,  theLag, value_float64 - theLag
FROM
  (
    SELECT
      signal_name, value_float64, lag(value_float64) as theLag
    FROM
      telemetry
    PARTITION BY
      signal_name
  )
WHERE
  signal_name = 'Speed'
LIMIT 5
KVeschgini commented 1 month ago

I posted the question here because I suspected it to be a bug. I am using proton version 1.5.14. I found out that value_float64 - lag(value_float64) dosen't respect the partition and takes the value from the global last row.

Thanks @yl-lisen for confirming the problem.