ts-safeql / safeql

Validate and auto-generate TypeScript types from raw SQL queries in PostgreSQL.
https://safeql.dev
MIT License
1.35k stars 22 forks source link

JSONB columns inferred as `any` #212

Closed klausbadelt closed 6 months ago

klausbadelt commented 6 months ago

Is your feature request related to a problem? Please describe. Recent json/b related update #193 improved JSONB expressions. But JSONB columns are still inferred as any. Implicitly typed queries - using custom type for a JSONB column - should not fail with 'Query has incorrect type annotation' for JSONB columns.

Problem is: where do we store the type?

Describe the solution you'd like We need a way to describe the type of a JSONB column.Instead, using the existing connection.overrides could be extended with columns, with "[table].[column]" as key and type as value. Example:

{
  "connections": {
    "overrides": {
      "columns": {
        "User.settings": "IUserSettings",
        "Movie.video": "IMovieVideo",
      }
    }
  }
}

with interfaces/types IUserSettings and IMovieVideo defined elsewhere.

Describe alternatives you've considered

  1. Can't find a workaround with overrides.types.jsonb: it only supports one (1) single type for all JSONB columns - a rare use case.
  2. Types as database comments IMHO is "Code in database" - anti-pattern, not convenient, "feels wrong".
klausbadelt commented 6 months ago

Relevant: #190 #102

karlhorky commented 6 months ago
  1. Can't find a workaround with overrides.types.jsonb: it only supports one (1) single type for all JSONB columns

One workaround that we're using with overrides.types.json = 'JsonAgg' is to use multiple files, to allow for different types using the same name:

Configuration:

{
  "connections": {
    // ...
    "overrides": {
      "types": {
        "json": "JsonAgg"
      }
    }
  }
}

file1.ts

type JsonAgg = {
  id: number;
  name: string;
}

// sql`` query somewhere below

file2.ts

type JsonAgg = {
  id: number;
  count: number;
}

// sql`` query somewhere below
karlhorky commented 6 months ago

Thinking through it, this may actually even work for a particular scope in a file - eg. inside of each function scope, declaring type JsonAgg (although I haven't tried this yet):

file1.ts

function a() {
  type JsonAgg = {
    id: number;
    name: string;
  };

  // sql`` query somewhere below
}

function b() {
  type JsonAgg = {
    id: number;
    count: number;
  };

  // sql`` query somewhere below
}
Newbie012 commented 6 months ago

There were two suggestions in the past:

Both have their pros and cons, but both lack the guarantee that the data matches the type that you defined. Implementing both solutions would be trivial, but they feel wrong and may eventually lead to confusion once the data won't match the interface.

In my codebase, I set json/b as unknown and then parse it using other tools (such as ajv, zod, etc). I already had cases when the parsing failed due incorrect schema validation (e.g., someone adds a property to an interface which is being referenced from the json interface without knowing it)

karlhorky commented 6 months ago

Both have their pros and cons, but both lack the guarantee that the data matches the type that you defined. Implementing both solutions would be trivial, but they feel wrong and may eventually lead to confusion once the data won't match the interface.

Hm, this makes sense (although that may be a tradeoff that is worth it, at least for a first version). The current json / jsonb workflows feel like SafeQL is missing a part (haven't seen this much elsewhere yet - works quite well!).

A recommendation to use a runtime validation step with Zod or similar - completely separate from SafeQL - would double down on this feeling I think.

So maybe for a first version, one of these suggestions would be an acceptable tradeoff (maybe with a docs note about it being a "sharp edge" to be used carefully).

Future / More Correct

In the future, two other ideas for more correctness:

  1. Integration of SafeQL with Zod or some other validator (only for these unknown fields), based on the existing type passed to SafeQL
  2. Any way of getting the "true" PostgreSQL data type behind the part of the query which leads to json / jsonb data - eg. from libpg-query (potentially with another query for the JSON part? maybe with json_typeof?)
klausbadelt commented 6 months ago

Hey thank you!

The type JsonAgg "trick" could be a workaround, albeit we don't want to re-define our column types in every file. I haven't tried but maybe would importing types put then into scope "enough" for the override to work? (I'll try if you don't have the answer ready).

klausbadelt commented 6 months ago

This type JsonAgg idea doesn't work for multiple, different JSONB columns in the same query though. So, unfortunately not a workaround.

JSONB columns are where types are especially useful though.

Since there's no '"true" PostgreSQL data type behind the part', and while Zod inferred types are generally a good idea IMHO, maybe a 3rd idea: let us override unknown without a lint error?

Newbie012 commented 6 months ago

I'm not a fan of JsonAgg due to a few reasons:


Any way of getting the "true" PostgreSQL data type behind the part of the query which leads to json / jsonb data - eg. from libpg-query (potentially with another query for the JSON part? maybe with json_typeof?)

SafeQL doesn't run the queries it analyzes (and it shouldn't). Thus, it doesn't know the shape of the column. But even if it would run the queries, JSON could be very dynamic and complex. If the data structure is important, I would (personally) recommend normalizing the data rather than using json/b columns.


Other libraries such as DrizzleORM allow defining types without run-time type checking and people are fine with it. Due to that, I believe it would be okay to introduce something like overrides.columns just like the OP suggested.

Thoughts?

klausbadelt commented 6 months ago

From idea over comments, implementation, documentation, PR, to merged & released within <18 hours. Wow @Newbie012 I'm very impressed. Thank you! 🙇

karlhorky commented 6 months ago

@Newbie012 There were two suggestions in the past [(overriding the column's type w. overrides config and retrieving the column's interface name from its schema comment)]

@karlhorky So maybe for a first version, one of these suggestions would be an acceptable tradeoff (maybe with a docs note about it being a "sharp edge" to be used carefully).

@Newbie012 I believe it would be okay to introduce something like overrides.columns just like the OP suggested

Yeah, for a first version, seems like it's an ok tradeoff.

Probably adding a note to the docs that this is actually not checking the type of the json/b columns is a good idea, so that users know what they're getting into.

Personally I'm a fan of keeping the database code as the source of truth for the types (eg. the "comments" option, similar to how Kanel does it), but as long as there's a way to do it in the first version, that's good enough for an MVP.

karlhorky commented 6 months ago

Future / More Correct

@karlhorky 2. Any way of getting the "true" PostgreSQL data type behind the part of the query which leads to json / jsonb data - eg. from libpg-query (potentially with another query for the JSON part? maybe with json_typeof?)

@Newbie012 SafeQL doesn't run the queries it analyzes (and it shouldn't). Thus, it doesn't know the shape of the column. But even if it would run the queries, JSON could be very dynamic and complex.

It's my understanding that SafeQL gets diagnostic information about the queries from PostgreSQL via libpg-query - I wasn't suggesting that queries be run, but rather that SafeQL retrieves the same diagnostic information with some kind of fragment or derived form of the JSON parts, potentially using functions like json_typeof.

Or maybe there is work that is either already started or could be proposed that PostgreSQL / libpg-query returns extra diagnostics for json/b fields. Seems like an arbitrary limitation that the diagnostic information stops at this boundary.

karlhorky commented 6 months ago

If the data structure is important, I would (personally) recommend normalizing the data rather than using json/b columns.

What would this look like, given the following query? Would you make 2 "waterfall" SQL queries, roundtripping between JS and SQL? (JS -> SQL -> JS -> SQL -> JS) If possible would love to avoid that.

sql<{ id: number; locations: { id: number; label: string }[]; }>`
  SELECT
    id,
    (
      SELECT
        jsonb_agg(jsonb_build_object(
          'id', locations.id,
          'label', locations.label
        ))
      FROM
        locations
        LEFT JOIN candidates_locations ON (
          locations.id = candidates_locations.location_id
          AND candidates_locations.candidate_id = candidates.id
        )
      WHERE
        candidates_locations.location_id IS NOT NULL
    ) AS locations
  FROM
    candidates
  WHERE
    slug = ${slug}
`
Newbie012 commented 6 months ago

Probably adding a note to the docs that this is actually not checking the type of the json/b columns is a good idea, so that users know what they're getting into.

I agree. PRs would be welcome.

Personally I'm a fan of https://github.com/ts-safeql/safeql/issues/102#issuecomment-1833386182 (eg. https://github.com/ts-safeql/safeql/issues/102#issuecomment-1833196632, similar to how Kanel does it), but as long as there's a way to do it in the first version, that's good enough for an MVP.

Can be implemented as well. Although, it has its own limitations. While it could be ideal for small apps, large organizations which could interact with the database using different codebases may potentially lead to type collision. As of now, there's no convention of typing columns using sql comments.


Regarding the query above, there's nothing wrong with returning JSON/B. I meant only about storing and selecting from JSON/B column data types. As long as PostgreSQL is able to plan and execute your query efficiently, I don't see anything wrong with it. I believe also that SafeQL v3 should be able to type the query as expected.