plausible / ecto_ch

Ecto ClickHouse adapter
MIT License
72 stars 10 forks source link

Repo.all calls to gather/aggregate data using S3 table function seem to fail with readonly=1 #198

Closed oceanlewis closed 6 days ago

oceanlewis commented 1 week ago

I'm seeing an issue where the readonly setting that ecto_ch uses by default prevents reading data from S3 buckets.

This seems to happen because ecto_ch hardcodes the readonly setting on :all operations to readonly: 1.

I didn't see any other issues referencing this in the issue tracker so it's likely no one has run into this before. If there's not a reason for having readonly set to 2 would it make more sense to have that as the default to allow for this use-case? Or maybe adding some documentation and updating the put_setting function to replace the setting rather than adding it as a duplicate key which feels brittle?

Example

query = from(
  fragment("s3(?, CSV, \"example String\")", ^"s3://my-bucket/*/data.csv"),
  select: [:company_id]
)

Repo.all(query) # fails
Repo.all(query, [settings: [readonly: 2]]) # succeeds

Here's the error received on failure:

** (Ch.Error) Code: 164. DB::Exception: admin: Cannot execute query in readonly mode. (READONLY) (version 24.6.2.17 (official build))
oceanlewis commented 1 week ago

I'll open a PR later once I have a moment, but locally I'm also able to address the issue with this patch:

diff --git a/lib/ecto/adapters/clickhouse.ex b/lib/ecto/adapters/clickhouse.ex
index c1161bb..2a04b8a 100644
--- a/lib/ecto/adapters/clickhouse.ex
+++ b/lib/ecto/adapters/clickhouse.ex
@@ -330,7 +330,7 @@ defmodule Ecto.Adapters.ClickHouse do
     opts =
       case operation do
         :all ->
-          [{:command, :select} | put_setting(opts, :readonly, 1)]
+          [{:command, :select} | put_new_setting(opts, {:readonly, 2})]

         :delete_all ->
           [{:command, :delete} | opts]
@@ -403,9 +403,10 @@ defmodule Ecto.Adapters.ClickHouse do
     Ecto.Adapter.Queryable.plan_query(operation, Ecto.Adapters.ClickHouse, queryable)
   end

-  defp put_setting(opts, key, value) do
-    setting = {key, value}
-    Keyword.update(opts, :settings, [setting], fn settings -> [setting | settings] end)
+  defp put_new_setting(opts, {key, value} = new_setting) do
+    Keyword.update(opts, :settings, [new_setting], fn
+      existing_settings -> Keyword.put_new(existing_settings, key, value)
+    end)
   end

   @doc false
ruslandoga commented 1 week ago

👋 @oceanlewis

:readonly option was used for SELECT queries to better reflect the HTTP GET behavior (which we used before switching all queries to POST) which automatically sets readonly=1:

When using the GET method in the HTTP interface, readonly = 1 is set automatically. To modify data, use the POST method.

Right now I think it might make sense to remove this option completely. It's not like we can do anything dangerous with Repo.all https://github.com/plausible/ecto_ch/pull/199

oceanlewis commented 1 week ago

Hey! Thanks for taking the time to look at this one. Once this option is removed will we still need to pass in readonly: 2 explicitly? If the default is 1 then it does sound like that is the case.

ruslandoga commented 1 week ago

No, since all our requests are HTTP POST right now, there wouldn't be any implicit readonly option set. Or it would be readonly=0, kind of.

oceanlewis commented 1 week ago

@ruslandoga thank you for clarifying! This is exactly the outcome I was hoping for, thank you ❤️