Open Zarathustra2 opened 1 year ago
It's probably your non-UTC ClickHouse server timezone issue. It happens due to timestamps in params being encoded as text and in rows -- as ints. I think I have "fixed" it in Ch a few days ago: https://github.com/plausible/ch#select-rows
But even with that change you'd still encounter issues with naive timestamp unless you switch to a UTC server.
But in my test I am not using a native timestamp I am using a timestamp with an UTC offset. So if I insert that timestamp and reuse to filter for that row I should get that inserted row since ~U[2023-04-20 17:00:25Z]
has a timezone.
So I kinda expect the tests to pass so I assume there is still something going wrong or am I misunderstanding something?
For ~U[..]
it should already be fixed in Ch
master: https://github.com/plausible/ch#select-rows
I updated my local chto to latest ch (commit d305fa092515a106a20f517e642228f75bbe2001
) and now the test code above fails.
from(
u in Product,
where: u.approved_at == ^time
)
|> TestRepo.exists?()
|> IO.inspect()
^ This still yields false
from(
u in Product,
where: fragment("? = ?", u.approved_at, ^time)
)
|> TestRepo.exists?()
|> IO.inspect()
^ This now throws an error:
1) test filtering datetimes (Ecto.Integration.TimestampsTest)
stacktrace:
(ch 0.1.0) lib/ch/query.ex:197: DBConnection.Query.Ch.Query.encode_param/1
(ch 0.1.0) lib/ch/query.ex:182: anonymous fn/1 in DBConnection.Query.Ch.Query.query_params/1
(elixir 1.14.3) lib/enum.ex:1658: Enum."-map/2-lists^map/1-0-"/2
(ch 0.1.0) lib/ch/query.ex:100: DBConnection.Query.Ch.Query.encode/3
(db_connection 2.5.0) lib/db_connection.ex:1336: DBConnection.maybe_encode/4
(db_connection 2.5.0) lib/db_connection.ex:707: DBConnection.execute/4
(ch 0.1.0) lib/ch.ex:51: Ch.query/4
(ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:438: Ecto.Adapters.SQL.query!/4
(chto 0.1.0) lib/ecto/adapters/clickhouse.ex:235: Ecto.Adapters.ClickHouse.execute/5
(ecto 3.10.1) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
(ecto 3.10.1) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
(ecto 3.10.1) lib/ecto/repo/queryable.ex:134: Ecto.Repo.Queryable.exists?/3
test/ecto/integration/timestamps_test.exs:71: (test)
I see. Checking
@Zarathustra2 I've added the test and it seems to be passing for me (UTC timezone). I misunderstood 🙈
test "filtering datetimes" do
# iso8601 type
time = ~U[2023-04-20 17:00:25Z]
account =
%Account{}
|> Account.changeset(%{name: "Test"})
|> TestRepo.insert!()
%Product{}
|> Product.changeset(%{
account_id: account.id,
name: "Qux1",
approved_at: time
})
|> TestRepo.insert!()
assert TestRepo.exists?(
from u in Product,
where: u.approved_at == ^time
)
assert TestRepo.exists?(
from u in Product,
where: fragment("? = ?", u.approved_at, ^time)
)
# `refute` because `DateTime` is stored as seconds, for milliseconds precision use `DateTime64(3)`
refute TestRepo.exists?(
from u in Product,
where: fragment("? = ?", u.approved_at, ^DateTime.to_unix(time, :millisecond))
)
end
Hmmm, if you run select timezone();
what do you get?
I get Europe/Berlin
UTC
What if you change ur database to Europe/Berlin
or anything that is not on UTC timezone. I think then the test will fail
at least that is the only difference I can now spot between your env and my as why the test fails
It seems to be more complicated than I thought. Simply encoding UTC datetimes to unix timestamps in query strings is not enough.
https://github.com/plausible/chto/pull/46 still fails on Europe/Berlin
timezone.
Seems like the timestamp inserted in RowBinary is also shifted by ClickHouse.
Simply encoding UTC datetimes to unix timestamps in query strings is not enough.
That is kinda weird because technically according to their documentation that should do it
(Dealing with time & timezones must be the most annoying domain problem I encounter :D )
Should the CI maybe also run with a different clickhouse timezone setting so that in the future we can catch future timezone issues?
For reference, this is what Ch is doing now: https://github.com/plausible/ch/commit/cea3726a83800269158c8f4e5f4896e90c2d90d9
Should the CI maybe also run with a different clickhouse timezone setting so that in the future we can catch future timezone issues?
Sure, but that'd would require reworking the tests a bit since naive datetimes would still fail. Probably a tag for each naive test like @tag :naive
and ExUnit.start(except: [:naive])
in test_helper.exs
based on some env var would be enough. But adding that right now is low priority since Plausible is all UTC :)
I'll return to this issue after finishing https://github.com/plausible/chto/pull/43
Thanks my friend. I am currently slammed with work (postgres -> clickhouse migration :D ) but looking to do https://github.com/plausible/chto/issues/37 over the weekend.
I have taken another look, look at the queries:
false
16:18:06.013 [debug] QUERY OK source="products" db=2.9ms idle=191.2ms
SELECT true FROM "products" AS p0 WHERE (p0."approved_at" = {$0:DateTime}) LIMIT 1 [~N[2023-04-20 17:00:25]]
true
16:18:06.019 [debug] QUERY OK source="products" db=2.8ms idle=195.7ms
SELECT true FROM "products" AS p0 WHERE (p0."approved_at" = {$0:DateTime}) LIMIT 1 [~U[2023-04-20 17:00:25Z]]
for some reason the DateTime gets casted to a NaiveDateTime. I ahve not been successful yet to locate where this happens and why but this is the issue. If it would use not cast it it would behave the same as the fragment
query
Nice find, Ecto schemas use naive datetimes by default.
@Zarathustra2 I've added Europe/Berlin
to CI in https://github.com/plausible/ecto_ch/pull/46 and adapted tests to pass. I've also added a modified version of your test to highlight the difference in "product"
vs Product
Ecto Query sources. The former doesn't cast ~U[]
to ~N[]
and the latter does. That affects how these datetimes are encoded in SELECT
query params.
Given the following test for
timestamps_test.exs
All 3 cases will print false for me.
Is this expected and if so why?