michelin / snowflake-grafana-datasource

Snowflake Grafana datasource plugin enables the visual representation of Snowflake data within Grafana dashboards and manages alerts.
Apache License 2.0
67 stars 33 forks source link

timezone conversion is inconsistent and prevents from partition pruning #56

Closed rumbin closed 8 months ago

rumbin commented 9 months ago

Observed behavior

A simple query like this one...

SELECT
    $__timeGroup(timestamp_ms, $__interval, previous) as time
    , count(*) as nb 
FROM my_table
WHERE true
    and $__timeFilter(timestamp_ms) 
GROUP BY
    1

...gets translated to...

SELECT
    TIME_SLICE(TO_TIMESTAMP(timestamp_ms), 60, 'SECOND', 'START') as time
    , count(*) as nb 
FROM my_table
WHERE true
    and CONVERT_TIMEZONE('UTC', 'UTC', timestamp_ms) > '2023-12-06T08:00:26.372Z' AND CONVERT_TIMEZONE('UTC', 'UTC', timestamp_ms) < '2023-12-08T08:00:26.372Z' 
GROUP BY
    1

I see two issues with this behavior:

  1. The timezone conversion is only applied on the $__timeFilter() and not on the $__timeGroup(). This is inconsistent, imho, and may lead to misinterpretation of the visialized time scale. Or am I wrong here?
  2. The timezone conversion performed by the $__timeFilter() is too complex for Snowflake in order to apply partition pruning. (At least if the timestamp column belongs to a view which is projecting a VARIANT column to a timestamp.)

Expected behavior

  1. $__timeFilter() and $__timeGroup() apply the same timezone conversions.
  2. The timezone conversion of the $__timeFilter() is... a. ...entirely skipped if the from and to timezone of the conversion are identical (UTC) b. ...performed in a way which does not hamper partition pruning. I.e., we could probably convert the timezone of the filter values (the timestamp literals provided by Grafana) instead of converting the timezone of the timestamp column.

Happy to discuss this issue. Is there any Slack channel where we could sync on it?

rumbin commented 9 months ago

I just checked it with two different tables:

The latter issue can be worked around by resorting to $__timeFrom() and $__timeTo()

...
where true
    and timestamp_ms > $__timeFrom()
    and timestamp_ms < $__timeTo()

This way the partition pruning works beautifully. So it's just the timezone conversion logic that needs to be changed.

Background

Our raw/lake data is stored in VARIANT fields for being robust and schema agnostic. We make this data queriable by creating schema-aware views on these tables which we expose to our users. This way we can also apply dynamic data masking and other governance features.

devnied commented 9 months ago

Thanks for your analysis. I am ok with your first point $__timeFilter() and $__timeGroup() should apply the same timezone conversions (I open an issue for this one) For the second point, The only fix I can see is providing two new macro without timezone conversion.

rumbin commented 9 months ago

For the second point, The only fix I can see is providing two new macro without timezone conversion.

Can't we just transfer the $__timeFilter() timezone conversion from this logic...

and CONVERT_TIMEZONE('UTC', 'Europe/Berlin', timestamp_ms) > '2023-12-06T08:00:26.372Z' AND CONVERT_TIMEZONE('UTC', 'Europe/Berlin', timestamp_ms) < '2023-12-08T08:00:26.372Z' 

...to something like this logic?

and timestamp_ms > CONVERT_TIMEZONE('Europe/Berlin', 'UTC', '2023-12-06T08:00:26.372Z'::timestamp_ntz) AND timestamp_ms < CONVERT_TIMEZONE('Europe/Berlin', 'UTC', '2023-12-08T08:00:26.372Z')

This would generally be cheaper on the database side since the conversion doesn't need to be performed for each row but only once for the provided timestamp literals.