nikita-volkov / hasql

The fastest PostgreSQL libpq-based driver for Haskell
http://hackage.haskell.org/package/hasql
MIT License
519 stars 55 forks source link

Type checking against the database schema #134

Open szg251 opened 4 years ago

szg251 commented 4 years ago

I've been experimenting with postgresql-typed and I really liked the concept of compile time type checking against the DB schema. Is there such a feature in mind for hasql?

In my opinion it doesn't have to be compile time, it could be even better, if it's a standalone script or library. My reasoning is:

nikita-volkov commented 4 years ago

It could be implemented as another extension library but I don't have any plans on doing it myself.

I find this approach fragile, since it introduces a compile time dependency on environment. Just changing the environment (e.g., by running the build on a different machine) will break a build which previously ran fine. This means that with such dependency compilation is no longer reproducible. This may easily cause a maintenance hell.

hugodro commented 3 years ago

I completely love the Hasql package (so thanks again for your contribution!)... but not having a compile-time check of the SQL schema is now the single biggest source of problems I run into in real life. Any pointers on how one would go ahead with an extension or integration for Hasql to support a schema definition? I've implemented a SQL compiler in the past so the parsing part is not a problem, but I don't know enough Haskell/Templates to just translate previous work into Hasql. And I'd personally prefer to put the DDL syntax into Hasql rather than have the db schema loaded from a live access to the db server (at least initially, probably one will lead to the other).

nikita-volkov commented 3 years ago

One thing you can do is take a look at what they've done in postgresql-typed.

Generally I recommend setting very clear goals for your project, because it can easily turn into a lot of not really needed work. Why-questions help.

szg251 commented 3 years ago

I would love to see something like a property testing tool, instead of a compile-time check. I agree with @nikita-volkov in that, the latter can get painful with different environments. I would also love to help implementing it, but unfortunately I don't have any time for open source now.

hugodro commented 3 years ago

I'm not sure if my "validate at compile time" idea is clear. I'm thinking of a syntax like the following (with the caveat that I don't know much about poly-types and templates and uppercase/lowercase, etc...):

-- Copy/paste the table definition code into a DDL type/template engine:
instance Validator CustomerDB where
  [ddl|
    create table Customer (
      id serial primary key :: int4
      , createdOn timestamp with time zone default now() :: timestampz
      , firstName varchar(32) :: text
      , lastName varchar(32) :: text
     );
    create table AddressCust (
      id serial primary key :: int4
      , customerfk int not null references Customer(id) :: int4
      , firstLine varchar(256) :: text
    );
 |]

-- Make sure that all the fields are consistent with the table definitions (and skip fields type-annotation  as they have been provided already in the DDL):
selectCustomerByID :: Int32 -> Session (Maybe (Text, Text, Text))
selectCustomerByID anID =
  Session.statement (anID) $ [maybeStatement |
     select
       firstname, lastname, firstline
     from
       customer a join addresscust b on a.id = b.customerfk
  |] :: CustomerDB -> Statement

Everything is self-contained within the haskell/hasql code, no dependency on external DB server. It simplifies the writing of each statement by centralising the type annotations in a single place. And it's activated on a statement-per-statement basis, so no impact on existing code.

nikita-volkov commented 3 years ago

This is not gonna be easy to implement. You'll need a DDL parser and SQL interpreter to be able to validate statements. That's on top of all the design you'll need to do.

However there's a more serious concern here of whether such a project really makes sense. Think about it. If you have your Haskell code control the schema of the database, it is likely a sign that you intend to use this database only in that Haskell codebase. Otherwise it would be a very awkward devops setup. That raises the question of whether you really need a language-agnostic relational database running as a separate server at all. Could it be that you're really looking for a storage engine designed for Haskell?

You see, when you find that you need to bend something very hard to adapt to your needs, it's often a sign that you're trying to use something for a purpose that it was not intended for, meaning that you need to be prepared for a lot of accidental problems and complexity as the result.

JivanRoquet commented 3 years ago

I tend to agree that compile-time checking should assume that all fields in the dB are consistent with what Haskell expects. Validating that this is actually true with regards to the real-world database should be the concern of automated tests, not compile-time checking.

imoverclocked commented 3 years ago

I think a schema (maybe checked in/versioned) should be a valid way to ensure all of the details (eg: nullable) line up correctly through a codebase. Doing it any other way leaves the door open for runtime errors that could have otherwise been detected by the compiler.

How the schema is managed (inside the repo for small projects or outside the repo for larger-scale projects with proper dev-ops) is a separate concern.

Moving this to a runtime check (on connect?) is an interesting problem. I think this approach would be more troublesome in a production setting since a db schema migration that fails the check would prohibit any instances of the currently deployed code from restarting, leading to a potentially unrecoverable cascade failure.

hugodro commented 2 years ago

I'm not sure we're in synch on the idea: what I'm looking for is not to use embedded DDL to control or validate what is in the DB. Rather I'm looking for a way to also explain of the schemas within the haskell code so that the embedded sql can be validated at compile-time not only for syntax correctness but also for field type correctness in a compact way. In a way it's taking all the "::<type" annotations from every embedded sql statement, and consolidating that into a single place.

Then way the actual DB table definitions become embedded into the haskell code is another topic; ie either I need to type the DDL code by hand by looking at a \dt dump or there is some tool that can read the sql definitions from the DB or text dump and then generate the relevant haskell code is something else (and of course I'd be looking at making that tool that automatically link the DB engine to the haskell embedded definitions...).