plausible / ecto_ch

ClickHouse adapter for Ecto
MIT License
79 stars 11 forks source link

Possible timestamp issue #45

Open Zarathustra2 opened 1 year ago

Zarathustra2 commented 1 year ago

Given the following test for timestamps_test.exs

  test "filtering datetimes" do
    # iso8601 type
    time = ~U[2023-04-20 17:00:25Z]

    account =
      %Account{}
      |> Account.changeset(%{name: "Test"})
      |> TestRepo.insert!()

    %Product{}
    |> Product.changeset(%{
      account_id: account.id,
      name: "Qux1",
      approved_at: time
    })
    |> TestRepo.insert!()

    from(
      u in Product,
      where: u.approved_at == ^time
    )
    |> TestRepo.exists?()
    |> IO.inspect()

    from(
      u in Product,
      where: fragment("? = ?", u.approved_at, ^time)
    )
    |> TestRepo.exists?()
    |> IO.inspect()

    from(
      u in Product,
      where: fragment("? = ?", u.approved_at, ^DateTime.to_unix(time, :millisecond))
    )
    |> TestRepo.exists?()
    |> IO.inspect()
  end

All 3 cases will print false for me.

Is this expected and if so why?

ruslandoga commented 1 year ago

It's probably your non-UTC ClickHouse server timezone issue. It happens due to timestamps in params being encoded as text and in rows -- as ints. I think I have "fixed" it in Ch a few days ago: https://github.com/plausible/ch#select-rows

But even with that change you'd still encounter issues with naive timestamp unless you switch to a UTC server.

Zarathustra2 commented 1 year ago

But in my test I am not using a native timestamp I am using a timestamp with an UTC offset. So if I insert that timestamp and reuse to filter for that row I should get that inserted row since ~U[2023-04-20 17:00:25Z] has a timezone.

So I kinda expect the tests to pass so I assume there is still something going wrong or am I misunderstanding something?

ruslandoga commented 1 year ago

For ~U[..] it should already be fixed in Ch master: https://github.com/plausible/ch#select-rows

Zarathustra2 commented 1 year ago

I updated my local chto to latest ch (commit d305fa092515a106a20f517e642228f75bbe2001) and now the test code above fails.

    from(
      u in Product,
      where: u.approved_at == ^time
    )
    |> TestRepo.exists?()
    |> IO.inspect()

^ This still yields false

    from(
      u in Product,
      where: fragment("? = ?", u.approved_at, ^time)
    )
    |> TestRepo.exists?()
    |> IO.inspect()

^ This now throws an error:

  1) test filtering datetimes (Ecto.Integration.TimestampsTest)
     stacktrace:
       (ch 0.1.0) lib/ch/query.ex:197: DBConnection.Query.Ch.Query.encode_param/1
       (ch 0.1.0) lib/ch/query.ex:182: anonymous fn/1 in DBConnection.Query.Ch.Query.query_params/1
       (elixir 1.14.3) lib/enum.ex:1658: Enum."-map/2-lists^map/1-0-"/2
       (ch 0.1.0) lib/ch/query.ex:100: DBConnection.Query.Ch.Query.encode/3
       (db_connection 2.5.0) lib/db_connection.ex:1336: DBConnection.maybe_encode/4
       (db_connection 2.5.0) lib/db_connection.ex:707: DBConnection.execute/4
       (ch 0.1.0) lib/ch.ex:51: Ch.query/4
       (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:438: Ecto.Adapters.SQL.query!/4
       (chto 0.1.0) lib/ecto/adapters/clickhouse.ex:235: Ecto.Adapters.ClickHouse.execute/5
       (ecto 3.10.1) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
       (ecto 3.10.1) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
       (ecto 3.10.1) lib/ecto/repo/queryable.ex:134: Ecto.Repo.Queryable.exists?/3
       test/ecto/integration/timestamps_test.exs:71: (test)
ruslandoga commented 1 year ago

I see. Checking

ruslandoga commented 1 year ago

@Zarathustra2 I've added the test and it seems to be passing for me (UTC timezone). I misunderstood 🙈

  test "filtering datetimes" do
    # iso8601 type
    time = ~U[2023-04-20 17:00:25Z]

    account =
      %Account{}
      |> Account.changeset(%{name: "Test"})
      |> TestRepo.insert!()

    %Product{}
    |> Product.changeset(%{
      account_id: account.id,
      name: "Qux1",
      approved_at: time
    })
    |> TestRepo.insert!()

    assert TestRepo.exists?(
             from u in Product,
               where: u.approved_at == ^time
           )

    assert TestRepo.exists?(
             from u in Product,
               where: fragment("? = ?", u.approved_at, ^time)
           )

    # `refute` because `DateTime` is stored as seconds, for milliseconds precision use `DateTime64(3)`
    refute TestRepo.exists?(
             from u in Product,
               where: fragment("? = ?", u.approved_at, ^DateTime.to_unix(time, :millisecond))
           )
  end
Zarathustra2 commented 1 year ago

Hmmm, if you run select timezone(); what do you get?

I get Europe/Berlin

ruslandoga commented 1 year ago

UTC

Zarathustra2 commented 1 year ago

What if you change ur database to Europe/Berlin or anything that is not on UTC timezone. I think then the test will fail

Zarathustra2 commented 1 year ago

at least that is the only difference I can now spot between your env and my as why the test fails

ruslandoga commented 1 year ago

It seems to be more complicated than I thought. Simply encoding UTC datetimes to unix timestamps in query strings is not enough.

ruslandoga commented 1 year ago

https://github.com/plausible/chto/pull/46 still fails on Europe/Berlin timezone.

ruslandoga commented 1 year ago

Seems like the timestamp inserted in RowBinary is also shifted by ClickHouse.

Zarathustra2 commented 1 year ago

Simply encoding UTC datetimes to unix timestamps in query strings is not enough.

That is kinda weird because technically according to their documentation that should do it

(Dealing with time & timezones must be the most annoying domain problem I encounter :D )

Zarathustra2 commented 1 year ago

Should the CI maybe also run with a different clickhouse timezone setting so that in the future we can catch future timezone issues?

ruslandoga commented 1 year ago

For reference, this is what Ch is doing now: https://github.com/plausible/ch/commit/cea3726a83800269158c8f4e5f4896e90c2d90d9

ruslandoga commented 1 year ago

Should the CI maybe also run with a different clickhouse timezone setting so that in the future we can catch future timezone issues?

Sure, but that'd would require reworking the tests a bit since naive datetimes would still fail. Probably a tag for each naive test like @tag :naive and ExUnit.start(except: [:naive]) in test_helper.exs based on some env var would be enough. But adding that right now is low priority since Plausible is all UTC :)

ruslandoga commented 1 year ago

I'll return to this issue after finishing https://github.com/plausible/chto/pull/43

Zarathustra2 commented 1 year ago

Thanks my friend. I am currently slammed with work (postgres -> clickhouse migration :D ) but looking to do https://github.com/plausible/chto/issues/37 over the weekend.

Zarathustra2 commented 1 year ago

I have taken another look, look at the queries:


false
16:18:06.013 [debug] QUERY OK source="products" db=2.9ms idle=191.2ms
SELECT true FROM "products" AS p0 WHERE (p0."approved_at" = {$0:DateTime}) LIMIT 1 [~N[2023-04-20 17:00:25]]

true
16:18:06.019 [debug] QUERY OK source="products" db=2.8ms idle=195.7ms
SELECT true FROM "products" AS p0 WHERE (p0."approved_at" = {$0:DateTime}) LIMIT 1 [~U[2023-04-20 17:00:25Z]]

for some reason the DateTime gets casted to a NaiveDateTime. I ahve not been successful yet to locate where this happens and why but this is the issue. If it would use not cast it it would behave the same as the fragment query

ruslandoga commented 1 year ago

Nice find, Ecto schemas use naive datetimes by default.

ruslandoga commented 1 year ago

@Zarathustra2 I've added Europe/Berlin to CI in https://github.com/plausible/ecto_ch/pull/46 and adapted tests to pass. I've also added a modified version of your test to highlight the difference in "product" vs Product Ecto Query sources. The former doesn't cast ~U[] to ~N[] and the latter does. That affects how these datetimes are encoded in SELECT query params.