paurkedal / ocaml-caqti

Cooperative-threaded access to relational data
https://paurkedal.github.io/ocaml-caqti/index.html
GNU Lesser General Public License v3.0
307 stars 36 forks source link

[WIP] Add support for array fields. #112

Closed mefyl closed 9 months ago

mefyl commented 9 months ago

This preliminary work adds support for Postgresql's array fields.

AFAIK this is only supported by pg. I do realize this enables writing non-backend-agnostic queries, I'm not sure what the policy is regarding that. In our specific case not being able to use arrays in prepared queries is pretty much a deal breaker as we need to `DELETE FROM _ WHERE id = ANY (?)' with massive arrays of ids, so individual queries won't cut it.

If this is not a no-no from the get-go, this PR has some limitations:

Reasonable unit tests included for both encoding and decoding.

mefyl commented 9 months ago

Some relevant discussion in #15

paurkedal commented 9 months ago

I would prefer not implementing something which is only available for one database system if we can avoid it, and the case you cite should be possible using JSON strings. I tested the following on all three database systems:

# let req =
  let open Caqti_type.Std in
  let open Caqti_request.Infix in
  string -->* int @@:-
  (function
   | `Mysql ->
      "SELECT id.x \
       FROM json_table(?, '$[*]' COLUMNS (x integer path '$')) AS id"
   | `Pgsql ->
      "SELECT id::text::int FROM json_array_elements(?::json) AS id"
   | `Sqlite ->
      "SELECT id.value FROM json_each(?) AS id"
   | _ -> raise Not_found);;
# Caqti_blocking.with_connection (Uri.of_string "sqlite3://") @@ fun (module C) ->
  C.collect_list req "[1, 2, 3]";;

Would this not fulfil the need? The queries will be more complex than with arrays, but on the other hand, this approach can be generalised to transmit rows of arbitrary types independent of the DB system using a well-establish encoding.

paurkedal commented 9 months ago

Actually, sticking to arrays, you original query works with a string input, so it should be possible to define the array type as a product (or custom) type, encoded as string. (One might need an ? :: int[] cast in some contexts, but that's probably the case also when sending the parameter with unknown type.)

mefyl commented 9 months ago

Thanks for the workarounds! I did manage to make it work, here are some learnings along the path:

So while not ideal, I got it to work in an acceptable-for-now fashion.

Some general reflections about agnostic SQL backends, feel free to skip my Saturday afternoon esoteric monologs

I think it's become a bit of a meme that we all pick a SQL backend agnostic middleware "just in case we want to change the backend", and that's on paper a very noble intent. Yet we pretty much never ever actually change it, because the main SQL engines have so many custom quirks that any real-world project ends up relying on backend-specific features very early. Not doing so means sacrificing a lot of performance and/or vastly complexifying the codebase just for the sake of being backend agnostic, which will never serve any actual purpose. For these reasons we're tied to Postgres, and we're using Caqti not for its backend abstraction capabilities, but as a higher-level PostgreSQL client. Because of this, if it gatekeeps us out of too many Postgres features, I fear it's going to become toolimiting. And maybe that's fine, we ought to use pgx directly and not ask of Caqti for things it's not being designed for. But long story short, my very personal point of view is that I'd prefer the library to give me access to all the backend-specific features I need because backend-dependent features are the reality of SQL databases. It would be to me then to use or not use them to keep all or part of my code backend agnostic.

paurkedal commented 9 months ago

I can see the issue that parametric polymorphism will not work in this case, as the array converter does not have access to the GADT encoding of the element type. On could still implement int_array or whatever one needs.

I am not so sure about the overhead, since PostgreSQL is good at optimising queries and a JSON array shouldn't be much heavier to parse than custom but still text-based array, at least in theory. There is also transmission overhead. But I may be wrong.

I agree it one often targets a single RDBMS in practise, I've done so myself for several applications, since it can be time-consuming to cover and test multiple dialects. Supporting multiple dialects can be made much easier with higher level APIs, which I've had in the back of my mind when designing the Caqti API. E.g. SQLAlchemy makes this a lot easier, though I don't think an ORM is the ideal solution for functional languages (or for relational databases for that matter). In lack of such a solution, I'm open to pragmatism, though.

I'm wondering if we could add array support with full coverage for PostgreSQL array encoding and JSON encoding for SQlite3 and MariaDB. I may do this myself at some point starting with your PR, if you don't want to pursue it yourself now. Some benchmarking of the array vs JSON solution for PostgreSQL would be good to establish that part of the benefit.