plausible / ecto_ch

ClickHouse adapter for Ecto
MIT License
79 stars 11 forks source link

add union_distinct #204

Closed andyleclair closed 1 month ago

andyleclair commented 1 month ago

ClickHouse supports UNION DISTINCT and it would be cool if this library did too

https://clickhouse.com/docs/en/sql-reference/statements/select/union

ruslandoga commented 1 month ago

👋 @andyleclair

Thank you for the PR!

I think we should check with Ecto team if that's a safe approach to use this seemingly undocumented API.

ruslandoga commented 1 month ago

Actually, aren't UNION and UNION DISTINCT the same thing?

From https://hexdocs.pm/ecto/Ecto.Query.html#union/2:

Union expression returns only unique rows as if each query returned distinct results. This may cause a performance penalty. If you need to combine multiple result sets without removing duplicate rows consider using union_all/2.

And https://chatgpt.com/share/670a1756-8870-8010-b407-d6ab2fc13656

jojoca-appcues commented 1 month ago

They're supposed to be the same but clickhouse forces you to specify whether it's a distinct or all operation. In certain cases, the queries fail if you don't specify the union strategy.

ruslandoga commented 1 month ago

👋 @jojoca-appcues

Can you please provide more information? Do you mean this situation:

:) select * from values(1) union select * from values(2);
-- Received exception from server (version 24.3.3):
-- Code: 558. DB::Exception: Received from localhost:9000. DB::Exception: Expected ALL or DISTINCT in SelectWithUnion query, because setting (union_default_mode) is empty: While processing SELECT * FROM values(1) UNION SELECT * FROM values(2). (EXPECTED_ALL_OR_DISTINCT)

In that case, instead of adding union_distinct, shouldn't we change the SQL generated for union to be UNION DISTINCT instead of just UNION so that Ecto's unions work out of the box?

jojoca-appcues commented 1 month ago

Exactly that error! I agree changing UNION with UNION DISTINCT would do the trick

ruslandoga commented 1 month ago

Also we can do this (already):

Repo.query("select * from values(1) union select * from values(1)", [], settings: [union_default_mode: "DISTINCT"])

# or in Ecto's DSL
Repo.all(
  from(
    fragment("system.one"),
    select: 1
  )
  |> union(
    ^from(
      fragment("system.one"),
      select: 1
    )
  ),
  settings: [union_default_mode: "DISTINCT"]
)
jojoca-appcues commented 1 month ago

Ah this is cool! I'll try it here 😃

andyleclair commented 1 month ago

I support changing union to return UNION DISTINCT. The default behavior of erroring if you don't pass the union_default_mode is a little annoying. I'm happy to make that change if you want, then we can drop the Builder call

ruslandoga commented 1 month ago

@andyleclair thank you! Yes, let's do this!

ruslandoga commented 1 month ago

I'll publish a new version later today.

ruslandoga commented 1 month ago

Published as v0.4.0.

Thank you @andyleclair @jojoca-appcues!