jawj / zapatos

Zero-abstraction Postgres for TypeScript: a non-ORM database library
https://jawj.github.io/zapatos/
Other
1.29k stars 45 forks source link

column name camel case conversion #9

Open moltar opened 4 years ago

moltar commented 4 years ago

Hey, it's me again :)

Would it be possible to add camel case conversion for column names?

E.g. the column name is foo_bar, but in TS/JS, it is more common to use camelCase.

jawj commented 4 years ago

Hi again! Hmm — I feel a bit conflicted on this.

On the one hand, you're right, the conventions in TS and SQL are different, which is a bit of a pain.

On the other hand, this would require modifications (and extra code) in quite a lot of places, and also seems potentially like the sort of confusing ORM magic I'm keen to avoid.

So I think this is probably a WONTFIX, but I'll keep it open in case others would like to chime in.

jawj commented 4 years ago

I've thought about this some more, and am pretty definite I don't want to build this in to the query mechanism (we'd need to recursively process nested select results and it would get pretty messy).

I'm considering whether some helper functions/types would be worth having, though. The typing is pretty hairy, but something like this could work (where a_b, c_d, and ef stand in for column names and table names):

type _ToA<T> =  // convert predefined string and object key types from snake_case to camelCase
  T extends 'a_b' | 'c_d' ? {
    a_b: 'aB';
    c_d: 'cD';
  }[T] :
  T extends string ? T :
  T extends { [k: string]: any } ?
  Omit<T, 'a_b' | 'c_d'> &
  ('a_b' extends keyof T ? { aB: T['a_b'] } : {}) &
  ('c_d' extends keyof T ? { cD: T['c_d'] } : {}) :
  never;

const x = 'c_d';

let c: _ToA<typeof x>;  // c has type 'cD'

const y = {
  a_b: 1,
  c_d: 'x',
  ef: true,
};

let d: _ToA<typeof y>;  // d has type { aB: number; cD: string; ef: boolean }

_ToA (and a corresponding ATo_) would of course be generated using a full list of all snake_case column and table names when running npx zapatos.

The functions to do the conversion are of course simple RegExp replacement, something along the lines: s => s.replace(/_(.)/g, m => m[1].toUpperCase()) and s => s.replace(/[A-Z]/g, m => '_' + m[0].toLowerCase()), directly or applied to Object.keys. And you would use these to wrap/unwrap values going into and coming out of the database.

Does this sound useful? I think it might be a bit of a drag to use, and obviously also adds some computational cost.

moltar commented 4 years ago

That's gnarly! 😁

I don't think that it is worth it.

Let's close this?

moltar commented 4 years ago

Thanks for looking into this btw!

jawj commented 4 years ago

Sure, and no problem. :)

michael-land commented 3 years ago

we'd need to recursively process nested select results and it would get pretty messy

Hi, @jawj I forked your repo and played with it.

For shortcuts, seems like it could be solved with few case transform, nested lateral join work out of the box.

image

image

image


For raw query executed with db.sql, node-pg already support camelcase

pool.on('connect' as any, (client: any) => {
  //use the connection to alter the rowDescription for your query
  client.connection.on('rowDescription', (msg:any) => {
    for (const field of msg.fields) {
      field.name = camelCase(field.name);
    }
    client.activeQuery.handleRowDescription(msg);
  });
});

test queries

  const variants = await db
    .select(
      'listing_variant',
      {
        camelCaseField0: 'active',
        camelCaseField1: 'new',
        camelCaseField2: dc.between(20000, 49999),
      },
      {
        columns: [
          'id',
          'camelCaseField1',
          'camelCaseField2',
        ],
        lateral: {
          listing: db.selectExactlyOne(
            'listing',
            { id: db.parent('listingId') },
            { columns: ['id', 'camelCaseField'] }
          ),
        },
      }
    )
    .run(pool);
  await db
          .update(
            'listing',
            {
              camelCaseField0: 1,
              camelCaseField1: 2,
            },
            { id: variant.listingId },
            { returning: ['camelCaseField1'] }
          )

it just a quick test to see if camelcase is possible. Would u reconsider integrate it into the core lib? or add hooks/plugins mechanism let developer handle it?

jgonera commented 3 years ago

+1 to a middleware mechanism like Slonik's interceptors. I like Zapatos' design more than Slonik's but my code base assumes camelCase everywhere now.

filipecatraia commented 3 years ago

I mean, this is going to sound obvious, but why not just give camelCase names to your tables? I've decided to use camelCase everywhere in my SQL, I mean, why not? Conventions are arbitrary. Sorry if I sound defensive, but I tested out 8 ORMs before finally settling on zapatos, and I'd like it to remain simple, maintained, and so on 😄

jgonera commented 3 years ago

@filipecatraia It's against Postgres conventions. Most people working with a Postgres database would expect to see columns named using underscores. If there's more than one service using the same database / more than one team working with it, most likely other developers are not going to be onboard with camelCase. There are also legacy databases where you're not going to rename everything.

Also, many people advise against camelCase because names in Postgres are not case sensitive by default so columnName and columnname are the same which can lead to bugs. You can use double quotes in your queries to avoid it but that's another thing to remember.

jawj commented 3 years ago

Right: unless we implement some sort of conversion routines, you have to pick which side (TypeScript or Postgres) is going to stay consistent with normal conventions and which side isn't. Everywhere I've used the library so far I've used camel case, to be consistent with the TypeScript code base. It does mean you have to double quote stuff when querying manually, but I do a lot less of that than writing code, and it's not that bad.

On the other side, since Zapatos will autocomplete most of your column names in TypeScript, having underscores there is also not a huge deal.

I should probably look into what Slonik does, but my feeling is that consistency with these conventions isn't worth complicating things for to any great degree.

I already feel like the complexity of the library could grow out of hand ...

jawj commented 3 years ago

I tested out 8 ORMs before finally settling on zapatos

@filipecatraia It's good to hear that Zapatos made the grade!

jgonera commented 3 years ago

On the other side, since Zapatos will autocomplete most of your column names in TypeScript, having underscores there is also not a huge deal.

I have en ESLint rule that enforces camelCase and I'd have to write // eslint-disable-next-line camelcase before each line that breaks that rule.

jgonera commented 3 years ago

I looked into it briefly and I think Zapatos wouldn't need to write a different schema from what it generates now and the whole conversion could be done with some sort of helper functions, e.g.:

import _camelcaseKeys from 'camelcase-keys';
import { selectExactlyOne } from 'zapatos/db';

type CamelCaseKey<T extends PropertyKey> = T extends string
  ? string extends T
    ? string
    : T extends `${infer F}_${infer R}`
    ? `${F}${Capitalize<CamelCaseKey<R>>}`
    : T
  : T;

type CamelCase<T> = { [K in keyof T as CamelCaseKey<K>]: T[K] };

function camelCaseKeys<T>(data: T): CamelCase<T> {
  return (_camelcaseKeys(data) as unknown) as CamelCase<T>;
}

// ***

camelCaseKeys(
  await selectExactlyOne('users', { email: 'something@test.com' }).run(pool),
);

(this is possible in TypeScript 4.1+, credit to https://stackoverflow.com/a/63715429/365238 and https://stackoverflow.com/a/65642944/365238)

It could be made prettier by accepting a Promise, supporting an array of objects and nested objects although for nested objects getting the type definitions right might get tricky (I'm no TypeScript expert, just a gut feeling).

Assuming this is all doable, it's a bit tedious to use this for every database call (+ snakeCaseKeys for a reverse conversion for query variables), so I'd still see a benefit of Zapatos providing some sort of interceptor mechanism where I could just plug these functions in. Otherwise I'd probably need to write a wrapper over the entire public API of Zapatos and then make sure that it stays in sync.

jgonera commented 3 years ago

FWIW, I had some issues with the implementations of those conversion types from the StackOverflow threads I mentioned. I found the types from https://github.com/sindresorhus/type-fest working well so far. This is what I use:

import { CamelCasedProperties, SnakeCasedProperties } from 'type-fest';

// converts properties of objects and objects in arrays to camelCase
export type Camel<T> = T extends Array<infer U>
  ? Array<Camel<U>>
  : CamelCasedProperties<T>;

// converts properties of objects and objects in arrays to snake_case
export type Snake<T> = T extends Array<infer U>
  ? Array<Snake<U>>
  : SnakeCasedProperties<T>;

type-fest also has CamelCasedPropertiesDeep and SnakeCasedPropertiesDeep but I found those to be too aggressive (e.g. converting properties of Date).

jawj commented 2 years ago

I had another look at this recently, and decided to see what it might look like. You can try out the experimental implementation on a new case-transforms branch. It's not as nasty as I feared.

How it works

Note: there must be no capital letters in any of your Postgres schema/table/column/enum names, or case conversions won't be reversible and Bad Things are guaranteed to happen.

First, set "nameTransforms": true in zapatosconfig.ts. Zapatos will then converts all snake_case names to camelCase when writing schema.d.ts (no fancy mapping types required).

Then, set "nameTransforms": true in a call to db.setConfig at runtime. This does three key things:

  1. All strings interpolated into SQLFragments (which have always been treated as schema/table/column identifiers) are automatically case-transformed. For example, db.sql`SELECT ${"myTextColumn"} FROM ${"mySchema.myTable"}` will compile to SELECT "my_text_column" FROM "my_schema"."my_table". Since the shortcut functions, Whereables and so on all use this same mechanism, this change deals with all identifiers heading into the database.
  2. The default runResultTransform function for SQLFragments, which was previously just (queryResult) => queryResult.rows, now automatically makes the reverse transformation. So the above query returns you { myTextColumn: string }[], as promised in schema.d.ts. This should deal with all basic manual queries that return table-like results.
  3. The shortcut functions are a bit more involved, since they put together JSON structures (which may be nested) within Postgres itself, using to_jsonb and jsonb_build_object. For these, we insert some fancy SQL subqueries that convert the relevant JSON object keys as part of the generated SQL. (Happily, generating JSON objects was already centralised in one place, so this was a very localised change).

Note that if you use to_jsonb, jsonb_build_object or (in some circumstances) json_agg in manual SQL queries, you can access the same fancy subqueries as SQLFragments you can embed there.

Custom transforms

It's actually possible to set up custom transformations, to do things other than convert snake_case <-> cameCase. It's slightly complex: instead of just passing true or false to nameTransforms, you provide an object implementing the NameTransforms interface. That has TypeScript functions to transform identifiers in both directions, and also functions that generate SQLFragment snippets that will (a) transform the keys of an arbitrary JSON object (for shortcut queries returning *) and (b) build a JSON object with appropriately transformed keys (an optimization for shortcut queries returning named columns).

When you set "nameTransforms": false, you get the nullTransforms, which keep everything the same, like so:

export const
  nullTransforms: NameTransforms = {
    ts: {
      fromPgToTs: noop,
      fromTsToPg: noop,
    },
    pg: {
      fromPgToTs: noop,  // currently unused
      fromTsToPg: noop,  // currently unused
      namedColumnsJSON: columns => sql`jsonb_build_object(${mapWithSeparator(columns, sql`, `, c => sql`${param(c)}::text, ${c}`)})`,
      allColumnsJSON: table => sql`to_jsonb(${table}.*)`,
    },
  };

When you set "nameTransforms": true, you get the snakeCamelTransforms instead, which look like this:

export const
  snakeToCamelFn = (s: string) => s.replace(/_[a-z]/g, m => m.charAt(1).toUpperCase()),
  camelToSnakeFn = (s: string) => s.replace(/[A-Z]/g, m => '_' + m.toLowerCase()),
  snakeToCamelSQL = (s: SQLFragment) => sql`(
    SELECT string_agg(CASE WHEN i = 1 THEN s ELSE upper(left(s, 1)) || right(s, -1) END, NULL) 
    FROM regexp_split_to_table(${s}, '_(?=[a-z])') WITH ORDINALITY AS rstt(s, i)
  )`,
  camelToSnakeSQL = (s: SQLFragment) => sql`(
    SELECT string_agg(CASE WHEN i = 1 THEN right(s, -1) ELSE lower(left(s, 1)) || right(s, -1) END, '_')
    FROM regexp_split_to_table('~' || ${s}, '(?=[A-Z])') WITH ORDINALITY AS rstt(s, i)
  )`,
  snakeCamelTransforms: NameTransforms = {
    ts: {
      fromPgToTs: snakeToCamelFn,
      fromTsToPg: camelToSnakeFn,
    },
    pg: {
      fromPgToTs: snakeToCamelSQL,  // currently unused
      fromTsToPg: camelToSnakeSQL,  // currently unused
      namedColumnsJSON: columns => sql`jsonb_build_object(${mapWithSeparator(columns, sql`, `, c => sql`${param(c)}::text, ${camelToSnakeFn(c)}`)})`,
      allColumnsJSON: table => sql`(SELECT jsonb_object_agg(${snakeToCamelSQL(sql`${raw('key')}`)}, value) FROM jsonb_each(to_jsonb(${table}.*)))`,
    },
  };

Examples

So, how does it look?

Manual select

const
  authorId = 1,
  query = db.sql<s.books.SQL>`
    SELECT * FROM ${"books"} WHERE ${{ authorId }}`,
  existingBooks: s.books.Selectable[] = await query.run(pool);
SELECT * FROM "books" WHERE ("author_id" = $1)
=> [
  {
    id: 121,
    authorId: 1,
    title: 'Pride and Prejudice',
    createdAt: 2022-06-08T21:29:50.246Z,
    updatedAt: 2022-06-08T21:29:50.246Z
  }
]

Manual insert

const
  newBook: s.books.Insertable = {
    authorId: 123,
    title: "One Hundred Years of Solitude",
  },
  query = db.sql<s.books.SQL>`
    INSERT INTO ${"books"} (${db.cols(newBook)})
    VALUES (${db.vals(newBook)})`,
  insertedBooks: s.books.Selectable[] = await query.run(pool);
INSERT INTO "books" ("author_id", "title") VALUES ($1, $2)
=> []

Shortcut select with lateral join

The shortcut SQL gets a bit hairy with all the key-transforming code:

const
  q = await db.select('authors', db.all, {
    lateral: { books: db.select('books', { authorId: db.parent('id') }) }
  }),
  r = await q.run(pool);
SELECT coalesce(jsonb_agg(result), '[]') AS result FROM (SELECT (SELECT jsonb_object_agg((
    SELECT string_agg(CASE WHEN i = 1 THEN s ELSE upper(left(s, 1)) || right(s, -1) END, NULL) 
    FROM regexp_split_to_table(key, '_(?=[a-z])') WITH ORDINALITY AS rstt(s, i)
  ), value) FROM jsonb_each(to_jsonb("authors".*))) || jsonb_build_object($1::text, "lateral_books".result) AS result FROM "authors" LEFT JOIN LATERAL (SELECT coalesce(jsonb_agg(result), '[]') AS result FROM (SELECT (SELECT jsonb_object_agg((
    SELECT string_agg(CASE WHEN i = 1 THEN s ELSE upper(left(s, 1)) || right(s, -1) END, NULL) 
    FROM regexp_split_to_table(key, '_(?=[a-z])') WITH ORDINALITY AS rstt(s, i)
  ), value) FROM jsonb_each(to_jsonb("books".*))) AS result FROM "books" WHERE ("author_id" = "authors"."id")) AS "sq_books") AS "lateral_books" ON true) AS "sq_authors"

=> [
  {
    id: 1,
    name: 'Jane Austen',
    books: [
      {
        id: 121,
        title: 'Pride and Prejudice',
        authorId: 1,
        createdAt: '2022-06-08T22:29:50.246832+01:00',
        updatedAt: '2022-06-08T22:29:50.246832+01:00'
      }
    ],
    isLiving: false
  }
]

Shortcut select with lateral joins and named columns

But it's actually no worse than before if you pick specific columns:

const people = await db.select('employees', db.all, {
  columns: ['name'],
  order: { by: 'name', direction: 'ASC' },
  lateral: {
    lineManager: db.selectOne('employees', { id: db.parent('managerId') },
      { alias: 'managers', columns: ['name'] }),
    directReports: db.count('employees', { managerId: db.parent('id') },
      { alias: 'reports' }),
  },
}).run(pool);
SELECT coalesce(jsonb_agg(result), '[]') AS result FROM (SELECT jsonb_build_object($1::text, "name") || jsonb_build_object($2::text, "lateral_directReports".result, $3::text, "lateral_lineManager".result) AS result FROM "employees" 
LEFT JOIN LATERAL (SELECT count("reports".*) AS result FROM "employees" AS "reports" WHERE ("manager_id" = "employees"."id")) AS "lateral_directReports" ON true 
LEFT JOIN LATERAL (SELECT jsonb_build_object($4::text, "name") AS result FROM "employees" AS "managers" WHERE ("id" = "employees"."manager_id") LIMIT $5) AS "lateral_lineManager" ON true ORDER BY "name" ASC) AS "sq_employees"

=> [
  { name: 'Anna', lineManager: null, directReports: 2 },
  { name: 'Beth', lineManager: { name: 'Anna' }, directReports: 1 },
  { name: 'Charlie', lineManager: { name: 'Anna' }, directReports: 0 },
  { name: 'Dougal', lineManager: { name: 'Beth' }, directReports: 0 }
]

Complex manual queries

Finally, what about using aggregates manually? Here we can tap into the configurable SQLFragments:

type authorBooksSQL = s.authors.SQL | s.books.SQL;
type authorBooksSelectable = s.authors.Selectable & { books: s.books.Selectable[] };

const
  { allColumnsJSON } = db.getConfig().nameTransforms.pg,
  query = db.sql<authorBooksSQL>`
    SELECT ${"authors"}.*, coalesce(json_agg(${allColumnsJSON("books")}) FILTER (WHERE ${"books"}.* IS NOT NULL), '[]') AS ${"books"}
    FROM ${"authors"} LEFT JOIN ${"books"} 
      ON ${"authors"}.${"id"} = ${"books"}.${"authorId"}
    GROUP BY ${"authors"}.${"id"}`,
  authorBooks: authorBooksSelectable[] = await query.run(pool);
SELECT "authors".*, coalesce(json_agg((SELECT jsonb_object_agg((
    SELECT string_agg(CASE WHEN i = 1 THEN s ELSE upper(left(s, 1)) || right(s, -1) END, NULL) 
    FROM regexp_split_to_table(key, '_(?=[a-z])') WITH ORDINALITY AS rstt(s, i)
  ), value) FROM jsonb_each(to_jsonb("books".*)))) FILTER (WHERE "books".* IS NOT NULL), '[]') AS "books"
        FROM "authors" LEFT JOIN "books" 
          ON "authors"."id" = "books"."author_id"
        GROUP BY "authors"."id"

=> [
  {
    id: 1,
    name: 'Jane Austen',
    isLiving: false,
    books: [
      {
        id: 121,
        title: 'Pride and Prejudice',
        authorId: 1,
        createdAt: '2022-06-08T22:29:50.246832+01:00',
        updatedAt: '2022-06-08T22:29:50.246832+01:00'
      }
    ]
  }
]
jawj commented 2 years ago

So: how do people feel about this? I'm somewhat in two minds on whether to release it. It adds some complexity to the code, and some additional foot-guns when using advanced SQL manually, and it also burns CPU cycles and memory on both the DB and the TS sides (if you use it) purely for cosmetic effect.

On the other hand, @moltar requested it, and for users like @jgonera who are committed to keeping pg in snake_case and TS in camelCase — which I can understand — I guess this feature is non-negotiable if they're going to use the library at all.

moltar commented 2 years ago

I cooled on the idea of even having this. I got used to the 🐍 case.

moltar commented 2 years ago

IMO looking at the branch, I think extra complexity is not worth breaking something later.

moltar commented 2 years ago

Thanks for attempt though 😁

tetchel commented 2 years ago

While this would be a cool cosmetic change, and I admit I dislike the mixing of camel/snake case throughout my TS, I agree with the other commentor that I don't want more complexity for:

  1. the generated typings - the typescript language server is slow enough already. I have noticed with some other libraries it can be quite slow when dealing with complex template literal types, which may apply here with (Camel|Snake)CasedProperties. But I don't know what an actual benchmark would look like.
  2. the generated SQL - the Zapatos-generated shortcut queries are already difficult for a human to parse, and as you admit above, this change makes them significantly hairier. likewise it adds some runtime overhead on every query for a cosmetic DX change.
  3. the library code itself. Obviously, this is a complex change which introduces significant new room for error.

So, while I admire and thank you for the work put into this implementation, I'd personally prefer to err on the side of simplicity for this one.

edit: As the feature is opt-in though, 1. and 2. may be alleviated.

jawj commented 2 years ago

Thanks @tetchel.

1. the generated typings - the typescript language server is slow enough already. I have noticed with some other libraries it can be quite slow when dealing with complex template literal types, which may apply here with `(Camel|Snake)CasedProperties`. But I don't know what an actual benchmark would look like.

There's actually no extra type work for the language server: all the name transforms are done at schema-generation time, and it looks to TypeScript as if your database really does use camelCase for everything.

2. the generated SQL - the Zapatos-generated shortcut queries are already difficult for a human to parse, and as you admit above, this change makes them significantly hairier. likewise it adds some runtime overhead on every query for a cosmetic DX change.

Yes, the SQL does get very hairy when you select or return *. That said, if you pick which columns you want back, it's no hairier than before (and then also adds no overhead). If the hairiness for * still worries you, it would also be possible to move the hairy bits out to pure SQL functions in pg, and refer to those in a custom config. But those queries will still use some extra resources on the pg side. It would be nice to benchmark how much (since no disk access should be involved, my hunch is that it's probably negligible).

3. the library code itself. Obviously, this is a complex change which introduces significant new room for error.

Yes. It is a bit complex, though I was pleasantly surprised that it wasn't worse. My least favourite bit is how at generation-time we end up transforming some names (e.g. enums) both ways at points.

edit: As the feature is opt-in though, 1. and 2. may be alleviated.

Indeed.

Overall: I don't think it's all that bad. But I'm still waiting for even one person to say they still like/want it ...

tbo commented 2 years ago

I think most people here arranged themself with the status quo, but it would likely improve the adoption in the long run. I for one would probably use it in my next project. Provided that it doesn't significantly impede the maintenance/stability of the lib I would give it a go.

jayp commented 2 years ago

I am still deciding between zapatos, pygtyped, and slonik for a new project.

The question is - what does @jawj prefer? popularity? or simplicity?

I think they're at crossroads. Most popular ORMs supports snake case in DB with camel case in language. But being more popular requires more features typically.

As an adoptor of a library, I guess, I want the simplest library so I could maintain it if need be. But I also want all the features I want - others be damned. lol.

synicko commented 1 year ago

I still very much like to have this feature in the lib. I'm working on a project that needs snake_case on postgres and camelCase on typescript. I built a version from the case-transform branch, it seems to perfectly do the job. I would like to convince the firm I'm working for to choose zapatos over any orm that would be heavier and soooo less user-friendly to use so please please, add the feature to the official release 🙏 🙏 🙏

pigoz commented 1 year ago

I'm investigating zapatos for a greenfield project.

I'd use this feature if it was available! Otherwise I'd probably use snake_case in TypeScript. camelCase in a database is just wrong IMHO.

jawj commented 1 year ago

Thanks guys. My feeling is it probably is worth releasing this feature, so I'll add that to my TODOs.

SAGV commented 1 year ago

This would be one of the game changers for zapatos since it would allow us to have same camelCased field names as exposed in postgraphile in graphql, for example