plausible / ecto_ch

Ecto ClickHouse adapter
MIT License
72 stars 10 forks source link

Unsupported JOIN ON conditions. #97

Closed daviaws closed 1 year ago

daviaws commented 1 year ago
Unexpected 'start_at >= datetime': While processing start_at >= datetime. (INVALID_JOIN_ON_EXPRESSION)

My query:

    date_range_cte(
      Timezone.from_datetime_to_date(start_date),
      Timezone.from_datetime_to_date(end_date)
    )
    |> join(:left, [d], vs in subquery(vehicle_summary_query),
    as: :daily_vehicle_summary
    )
    |> select([d, daily_vehicle_summary: dvs], {d.date, dvs})
    |> order_by([d], asc: d.date)
    |> RepoClickhouse.all()

Gives result:

[
  {~D[2016-01-01],
   %Hermes.Vehicles.VehicleSummaryReplicated{
     vehicle_id: "26df03b5-6a6b-4aba-87e4-8a4009805eed",
     start_at: ~N[2016-01-03 02:30:00],
     end_at: ~N[2016-01-03 02:59:59],
   }},
  {~D[2016-01-01],
   %Hermes.Vehicles.VehicleSummaryReplicated{
     vehicle_id: "26df03b5-6a6b-4aba-87e4-8a4009805eed",
     start_at: ~N[2016-01-01 03:00:00],
     end_at: ~N[2016-01-01 03:00:00],
   }},
  {~D[2016-01-02],
   %Hermes.Vehicles.VehicleSummaryReplicated{
     vehicle_id: "26df03b5-6a6b-4aba-87e4-8a4009805eed",
     start_at: ~N[2016-01-03 02:30:00],
     end_at: ~N[2016-01-03 02:59:59],
   }},
  {~D[2016-01-02],
   %Hermes.Vehicles.VehicleSummaryReplicated{
     vehicle_id: "26df03b5-6a6b-4aba-87e4-8a4009805eed",
     start_at: ~N[2016-01-01 03:00:00],
     end_at: ~N[2016-01-01 03:00:00],
   }}
]

I was expecting to return one per day instead all X all. But if I try to:

date_range_cte(
      Timezone.from_datetime_to_date(start_date),
      Timezone.from_datetime_to_date(end_date)
    )
    |> join(:left, [d], vs in subquery(vehicle_summary_query),
      on:
        vs.start_at >= d.datetime
        and
          vs.start_at <=
            fragment(
              "(? + interval '1 day -1 second')",
              d.datetime
            ),
      as: :daily_vehicle_summary
    )
    |> select([d, daily_vehicle_summary: dvs], {d.date, dvs})
    |> order_by([d], asc: d.date)
    |> RepoClickhouse.all()

I receive the error message. Plus I tried with date and datetime. Both with same result.

Any clues? :C

ruslandoga commented 1 year ago

👋 @daviaws

I was expecting to return one per day instead all X all

I think all X all is how left join ... on true is supposed to work.

I receive the error message.

Probably a ClickHouse limitation like https://github.com/ClickHouse/ClickHouse/issues/31416

ruslandoga commented 1 year ago

These types of queries can be handled with

daviaws commented 1 year ago

Note that even if I remove the and and the expression before or the expression after the query keep giving the same error message: much probably because the datetime comparison on join. But I dunno for sure.

daviaws commented 1 year ago

Yes, the problem seems to be with comparisons of datetime. If I compare 2 > 1, or 2 >= 1 it works. But if I compare something like: fragment("toUnixTimestamp(?) > toUnixTimestamp(?)", vs.start_at, d.date), even if I precompute the field in another row, even if I select the fields and they really are integers, it doesn't work.

Fun fact is, if I try this: vs.start_at == d.date it works.

I'm amazed.

Zarathustra2 commented 1 year ago

out of curiosity how does the generated SQL look like?

daviaws commented 1 year ago

Something like this with unix timestamp version :thinking:

{"WITH \"dates\" AS (SELECT toDate({$0:String}) + toIntervalDay(day_offset) AS date,
        toUnixTimestamp(date) AS unix_datetime
      FROM
      (
              SELECT arrayJoin(
                range(
                  0,
              toUInt32(toDate({$1:String})) - toUInt32(toDate({$2:String})) + 1
            )
          ) AS day_offset
      ) AS date) SELECT d0.\"unix_datetime\",s1.\"start_at_unix\"
      FROM \"dates\" AS d0 LEFT OUTER JOIN (SELECT sv0.\"vehicle_id\" AS \"vehicle_id\",sv0.\"start_at\" AS \"start_at\",sv0.\"end_at\" AS \"end_at\",toUnixTimestamp(sv0.\"start_at\") AS \"start_at_unix\",toUnixTimestamp(sv0.\"end_at\") AS \"end_at_unix\" FROM \"vehicle_summaries_replicated\" AS sv0 WHERE (sv0.\"vehicle_id\" IN ({$3:String})) AND ((sv0.\"start_at\" >= parseDateTimeBestEffort({$4:String})) AND (sv0.\"end_at\" <= parseDateTimeBestEffort({$5:String})))) AS s1 ON s1.\"start_at_unix\" >= d0.\"unix_datetime\" ORDER BY d0.\"date\"",
 ["2016-01-01", "2016-01-02", "2016-01-01",
  "364f516d-07c8-426f-86aa-2c81ea6f1991", "2016-01-01T00:00:00.000-03:00",
  "2016-01-02T23:59:59.999-03:00"]}