sqlc-dev / sqlc

Generate type-safe code from SQL
https://sqlc.dev
MIT License
11.54k stars 741 forks source link

Add CockroachDB support #1389

Open danthegoodman1 opened 2 years ago

danthegoodman1 commented 2 years ago

When trying to use sqlc for our project I noticed a few things were not supported that we require. We do use CockroachDB with the postgres engine.

  1. The ON UPDATE expression caused SQL validation errors. This is specific to CRDB.
  2. Hash sharded indexes caused SQL validation errors as well
  3. Composite primary keys with ordering like PRIMARY KEY(col1, col2 DESC) said that DESC was an error
  4. The STORING clause was also found to be an error
ajwerner commented 2 years ago

I feel like you don't need to tell sqlc about the exact schema because it doesn't ultimately manage the schema. All of the features you just mentioned are schema features and not query features. Like, you can imagine telling sqlc about a table that doesn't use any of that syntax and then all the queries should still work, right?

danthegoodman1 commented 2 years ago

@ajwerner That is true, but now I need to have SQL generating from a file that is not the true SQL of our schema, which means a second place to manage SQL and manually ensuring that it is consistent with our true SQL schema (which could lead to errors, which sqlc is designed to avoid)

For example we have the following primary key:

PRIMARY KEY(player_id, id DESC) 

sqlc does not like the DESC part of it (considers invalid SQL, which is weird because this works in postgres too...)

Removing the DESC would now be an incorrect version of the table.

Another example is this column:

updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ON UPDATE NOW(), -- auto update the timestamp

We would have to maintain another copy of that that that removes the ON UPDATE. I could see the confusion of developers now manually updating this column... maybe I need big warnings at the top of the fake schema file :)

Initially I think the risk of losing consistency between the SQL schema and the actual schema used is not worth the extra time writing the SQL wrapper functions manually in this case. Since we always have to update the code in 2 places: Either the schema and the sqlc compatible schema, or the schema and the functions that use the affected parts of the schema.

TL;DR I believe you make an excellent point, but I am still considering the balance of maintaining 2 schemas vs a schema and hand written functions.

danthegoodman1 commented 2 years ago

In reality this isn't something we should be constantly iterating on anyway... so maybe that leans more in favor of doing the second fake sqlc schema...

danthegoodman1 commented 2 years ago

After a bit of testing we can actually generate this fake sqlc schema from the real schema as part of the makefile by just copying, and removing things like DESC, ON UPDATE NOW(), and any index lines.

Definitely not ideal but it removes the manual maintaining of 2 schemas

kyleconroy commented 2 years ago

I feel like you don't need to tell sqlc about the exact schema because it doesn't ultimately manage the schema.

sqlc expects to have an accurate view of your schema to power compile-time checks. Maintaining two sets of schemas is not something I want to encourage people to do unless it's working around a current issue.

We do use CockroachDB with the postgres engine.

As long as the syntax you're using is valid PostgreSQL, sqlc should be able to handle it. We don't support any CockroachDB extensions at this time.

Since this issue is actually four different bug reports, I'm going to make four new issues for each one and then close this one out.

danthegoodman1 commented 2 years ago

@kyleconroy

sqlc expects to have an accurate view of your schema to power compile-time checks. Maintaining two sets of schemas is not something I want to encourage people to do unless it's working around a current issue.

Table wise the additional generation I would do wouldn't change anything, it would only remove sorting and auto updating. That doesn't change anything about the column data types, just default values which sqlc does not seem to care about and leave to the query writer.

I do agree it's not ideal, but it's a current workaround that doesn't break sqlc.

danthegoodman1 commented 2 years ago

For example, this part of a table goes from

  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ON UPDATE NOW(), -- auto update the timestamp
  ttl TIMESTAMPTZ NOT NULL,
  tokens INT64 NOT NULL,
  PRIMARY KEY(player_id, ttl DESC, id DESC)

to

 updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- auto update the timestamp
  ttl TIMESTAMPTZ NOT NULL,
  tokens INT64 NOT NULL,
  PRIMARY KEY(player_id, ttl, id)
ajwerner commented 2 years ago

As long as the syntax you're using is valid PostgreSQL, sqlc should be able to handle it. We don't support any CockroachDB extensions at this time.

At the end of the day, this is the root of the problem. Some of these are cockroach-specific syntax extensions. There's talk afoot of making the USING HASH syntax actually postgres compliant. I appreciate that you're spinning out issues to support the various features. Here's some color on each of them. I'll move the commentary over to the issues.

  1. The ON UPDATE expression caused SQL validation errors. This is specific to CRDB.

This is wholly cockroach-exclusive syntax added in 21.2. We added it pragramatically to side-step not having generalized triggers. It's very valuable, but definitely not postgres syntax. #1391

  1. Hash sharded indexes caused SQL validation errors as well

We're removing the need to specify the BUCKET_COUNT in 22.1, which, I think, when omitting that parameter, will be postgres compatible syntax.

  1. Composite primary keys with ordering like PRIMARY KEY(col1, col2 DESC) said that DESC was an error

This is a cockroach extension.

  1. The STORING clause was also found to be an error

This is a cockroach extension.

kyleconroy commented 2 years ago

All four of these are CockroachDB extensions or features. Primary key ordering isn't a PostgreSQL feature, at least not as of PostgreSQL.

ajwerner commented 2 years ago

Sorry about the primary key ordering comment. Yeah, these are all extensions.

Hades32 commented 2 years ago

Another extension that SQLC would just need to keep verbatim is AS OF SYSTEM TIME, which is used like this:

SELECT * FROM codes
AS OF SYSTEM TIME follower_read_timestamp()
WHERE code = $1
tnarg commented 2 years ago

CockroachDB tables also have a special system column crdb_internal_mvcc_timestamp[1] which should be selectable. I'm curious if system columns in postgresql are currently supported.

1: https://github.com/cockroachdb/cockroach/pull/51494

ajwerner commented 2 years ago

Much work has been done in recents months to make the cockroach SQL parser depend on less code. It's newly plausible that we'll have a go get-able parser one day in the not-so-distant future. I imagine that'll be a prerequisite to getting first-class support.

nickjackson commented 2 years ago

Much work has been done in recents months to make the cockroach SQL parser depend on less code. It's newly plausible that we'll have a go get-able parser one day in the not-so-distant future.

@ajwerner I'm really excited about this! I've got a schema diffing tool that I'd like to open source, but I think it would be more appropriate to wait for your parser to be go get'able before I do.

Hades32 commented 1 year ago

Another feature for the list: UPSERT. From slqc's point of view it's equivalent to INSERT

ajwerner commented 1 year ago

@nickjackson the parser library now exists! See https://github.com/cockroachdb/cockroachdb-parser.

bgentry commented 1 year ago

We're big fans of sqlc at @muxinc. However we're also big users of CockroachDB, and have recently started hitting some of the pain points of using the two in combination. We've already been manually managing a 2nd Postgres-compatible schema definition to power sqlc, and that's worked ok so far. Of course we'd love to avoid doing that.

More recently, however, we've started to be hit more directly by the need for CockroachDB syntax support. CockroachDB's planner is fairly limited in its index selection abilities as compared to Postgres, so we often find that we need to explicitly specify the target index for a query via index hints (SELECT FROM my_table@my_index_name) in order for the query to take advantage of the desired index. Without this, we are unable to safely convert some legacy code to sqlc because we can't rely on CockroachDB to take advantage of the correct secondary index on its own 😕

Definitely count us in as a big 👍 for this feature :pray:

kyleconroy commented 1 year ago

The main reason I haven’t prioritized Cockroach support in the past was the difficulty in accessing just the parser and the licensing issues. It sounds like both are now fixed.

Will need to do a bit of investigating 🕵️‍♀️

russoj88 commented 11 months ago

Here's something I came across while evaluating sqlc:

schema.sql

CREATE TABLE t
(
    id         UUID   NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
    mixed_caps STRING NOT NULL,
    lower_case STRING NOT NULL AS (lower(mixed_caps)) STORED
)
schema.sql:5:33: syntax error at or near "AS"
ajwerner commented 11 months ago

@russoj88 I think for this one you can do it like:

CREATE TABLE t
(
    id         UUID   NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
    mixed_caps STRING NOT NULL,
    lower_case STRING NOT NULL GENERATED ALWAYS AS (lower(mixed_caps)) STORED
)
russoj88 commented 11 months ago

@ajwerner that did work. Thanks!