[WIP] Add support for array fields.

mefyl commented 9 months ago

This preliminary work adds support for Postgresql's array fields.

AFAIK this is only supported by pg. I do realize this enables writing non-backend-agnostic queries, I'm not sure what the policy is regarding that. In our specific case not being able to use arrays in prepared queries is pretty much a deal breaker as we need to `DELETE FROM _ WHERE id = ANY (?)' with massive arrays of ids, so individual queries won't cut it.

If this is not a no-no from the get-go, this PR has some limitations:

The decoding of array values is not fully compliant: the items separator can be changed depending on the element type. It is however comma (,) for handled atomic types, so I'd say this is good enough for the foreseeable future.
I wasn't sure what to do with oid_of_field_type since I couldn't find a deterministic way to compute it for arrays, but slapping unknown_oid doesn't seem to cause any issues, so no harm seen no harm done I suppose.
I'm not sure of the best way to reject arrays on backend other than Postgres.

Reasonable unit tests included for both encoding and decoding.

mefyl commented 9 months ago

Some relevant discussion in #15

paurkedal commented 9 months ago

I would prefer not implementing something which is only available for one database system if we can avoid it, and the case you cite should be possible using JSON strings. I tested the following on all three database systems:

# let req =
  let open Caqti_type.Std in
  let open Caqti_request.Infix in
  string -->* int @@:-
  (function
   | `Mysql ->
      "SELECT id.x \
       FROM json_table(?, '$[*]' COLUMNS (x integer path '$')) AS id"
   | `Pgsql ->
      "SELECT id::text::int FROM json_array_elements(?::json) AS id"
   | `Sqlite ->
      "SELECT id.value FROM json_each(?) AS id"
   | _ -> raise Not_found);;
# Caqti_blocking.with_connection (Uri.of_string "sqlite3://") @@ fun (module C) ->
  C.collect_list req "[1, 2, 3]";;

Would this not fulfil the need? The queries will be more complex than with arrays, but on the other hand, this approach can be generalised to transmit rows of arbitrary types independent of the DB system using a well-establish encoding.

paurkedal commented 9 months ago

Actually, sticking to arrays, you original query works with a string input, so it should be possible to define the array type as a product (or custom) type, encoded as string. (One might need an ? :: int[] cast in some contexts, but that's probably the case also when sending the parameter with unknown type.)

mefyl commented 9 months ago

Thanks for the workarounds! I did manage to make it work, here are some learnings along the path:

If first tried with your latter suggestion, actual PostgreSQL arrays encoded as strings, but I gave up because the Caqti_type.custom implementation would have needed to encode array elements in the PostgreSQL format, which is not available at that point since this type is agnostic. This would mean either rewriting all the PG atomic type encoding or passing some recursion helpers to Caqti_type.custom.
I thus fell back to the JSON approach, which I managed to get working, at the cost of fairly complex query generation and possibly quite a bit of performance overhead.

So while not ideal, I got it to work in an acceptable-for-now fashion.

Some general reflections about agnostic SQL backends, feel free to skip my Saturday afternoon esoteric monologs

I think it's become a bit of a meme that we all pick a SQL backend agnostic middleware "just in case we want to change the backend", and that's on paper a very noble intent. Yet we pretty much never ever actually change it, because the main SQL engines have so many custom quirks that any real-world project ends up relying on backend-specific features very early. Not doing so means sacrificing a lot of performance and/or vastly complexifying the codebase just for the sake of being backend agnostic, which will never serve any actual purpose. For these reasons we're tied to Postgres, and we're using Caqti not for its backend abstraction capabilities, but as a higher-level PostgreSQL client. Because of this, if it gatekeeps us out of too many Postgres features, I fear it's going to become toolimiting. And maybe that's fine, we ought to use pgx directly and not ask of Caqti for things it's not being designed for. But long story short, my very personal point of view is that I'd prefer the library to give me access to all the backend-specific features I need because backend-dependent features are the reality of SQL databases. It would be to me then to use or not use them to keep all or part of my code backend agnostic.

paurkedal commented 9 months ago

I can see the issue that parametric polymorphism will not work in this case, as the array converter does not have access to the GADT encoding of the element type. On could still implement int_array or whatever one needs.

I am not so sure about the overhead, since PostgreSQL is good at optimising queries and a JSON array shouldn't be much heavier to parse than custom but still text-based array, at least in theory. There is also transmission overhead. But I may be wrong.

I agree it one often targets a single RDBMS in practise, I've done so myself for several applications, since it can be time-consuming to cover and test multiple dialects. Supporting multiple dialects can be made much easier with higher level APIs, which I've had in the back of my mind when designing the Caqti API. E.g. SQLAlchemy makes this a lot easier, though I don't think an ORM is the ideal solution for functional languages (or for relational databases for that matter). In lack of such a solution, I'm open to pragmatism, though.

I'm wondering if we could add array support with full coverage for PostgreSQL array encoding and JSON encoding for SQlite3 and MariaDB. I may do this myself at some point starting with your PR, if you don't want to pursue it yourself now. Some benchmarking of the array vs JSON solution for PostgreSQL would be good to establish that part of the benefit.

paurkedal / ocaml-caqti

[WIP] Add support for array fields. #112