graphile / crystal

๐Ÿ”ฎ Graphile's Crystal Monorepo; home to Grafast, PostGraphile, pg-introspection, pg-sql2 and much more!
https://graphile.org/
Other
12.61k stars 570 forks source link

Question: Compatibility with Citus and Cockroach DB #645

Closed corporatepiyush closed 6 years ago

corporatepiyush commented 6 years ago

Since Citus DB and Cockroach DB use Postgres compatible wire protocol and truly distributed sharding and transactions it would be cool if this library works with them. Is this library has been tested against these Databases ?

benjie commented 6 years ago

As far as I know they have not been tested. Compatibility mostly comes down to Postgres' introspection features - do the databases you mention provide this? You can check by running the introspection query:

https://github.com/graphile/graphile-build/blob/master/packages/graphile-build-pg/res/introspection-query.sql

If that works, there are a few other sticking points (such as CTE support and more) but that's the main things.

benjie commented 6 years ago

I'd love to hear definite results from someone who's tried it ๐Ÿ‘

corporatepiyush commented 6 years ago

Yes Citus is pg extension but it has some restrictions. ref link: https://www.citusdata.com/faq

Do you support all PostgreSQL features and the SQL standard?

Since Citus provides distributed functionality by extending PostgreSQL, it uses the standard PostgreSQL SQL constructs. It provides full SQL support for queries which access a single node in the database cluster. These queries are common, for instance, in multi-tenant applications where different nodes store different tenants (see When to Use Citus).

Other queries which, by contrast, combine data from multiple nodes, do not support the entire spectrum of PostgreSQL features. However they still enjoy broad SQL coverage, including semi-structured data types (like jsonb, hstore), full text search, operators, functions, and foreign data wrappers. Note that the following constructs arenโ€™t supported natively for cross-node queries:

Window Functions
CTEs
Set operations
Transactional semantics for queries that span across multiple shards
corporatepiyush commented 6 years ago

CockroachDB limitations ref link: https://www.cockroachlabs.com/docs/stable/known-limitations.html https://www.cockroachlabs.com/docs/stable/sql-grammar.html

benjie commented 6 years ago

PostGraphile uses CTEs heavily; we also use all those other features that don't have support from Citus. So I'm going to say: "No, we do not support Citus".

MaxDesiatov commented 6 years ago

would it be hard to enable tiered support for these DBs? As in, a user could get basic functionality with Citus and CockroachDB, but can get full native Postgres support by enabling a flag in Postgraphile constructor parameter?

benjie commented 6 years ago

At the moment, the system is built targeting Postgres and leverages its features. We could scale it back so it were more basic (and portable) but we'd be sacrificing a lot - not to mention complicating development. I think with our limited development resources it's not worth considering right now.

corporatepiyush commented 6 years ago

Citus v7.2 released yesterday. It now supports CTE and bunch of other stuffs https://www.citusdata.com/blog/2018/01/26/citus-7-2-ctes-complex-subqueries-set-operations

Do you have test suite which can be run against it ?

benjie commented 6 years ago

Yes; instructions are in the graphile-build README:

https://github.com/graphile/graphile-build#development

Though thinking about it, they seem to exclude the instruction to create a test database and export it's URL:

createdb graphile_test
export TEST_DATABASE_URL="postgres://localhost/graphile_test"
lerna run test
corporatepiyush commented 6 years ago

This is the result I got for Citus v7.2

lerna info version 2.8.0
yarn run v1.3.2
$ true
Done in 0.07s.
yarn run v1.3.2
$ jest .
No tests found
In /home/piyush/work/github/graphile-build/packages/graphql-parse-resolve-info
  4 files checked.
  testMatch:  - 4 matches
  testPathIgnorePatterns: /node_modules/ - 4 matches
  testRegex: __tests__/.*\.test\.js$ - 0 matches
Pattern: "." - 0 matches
Done in 0.40s.
yarn run v1.3.2
$ jest
Done in 1.02s.
yarn run v1.3.2
$ jest
Done in 0.75s.
yarn run v1.3.2
$ scripts/test

Database reset successfully โœ…
Done in 4.43s.
lerna success run Ran npm script 'test' in packages:
lerna success - graphile-build-pg
lerna success - graphile-build
lerna success - graphile
lerna success - graphql-parse-resolve-info
lerna success - postgraphile-core
corporatepiyush commented 6 years ago

Installation of Citus v7.2 was straight forward. I had to remove existing postgres installation on my Ubuntu machine and then follow download instruction here

https://www.citusdata.com/download/

benjie commented 6 years ago

Very promising!

benjie commented 6 years ago

Hey @corporatepiyush Have you been using Citus with PostGraphile? Any issues?

benjie commented 6 years ago

[semi-automated message] We try and keep the open issues to actual issues (bugs, etc); this seems like more of a discussion right now, so I'm closing it but please feel free to keep discussing it below ๐Ÿ‘

mubaidr commented 4 years ago

Since a major use of citus is to develop a multi-vendor database using sharding, how do we manage multi-vendor queries in postgraphile?

benjie commented 4 years ago

Do you mean multi-tenant, or is multi-vendor something else? Postgres supports partitioning, and you can easily add a restrictive RLS policy to each table to enforce tenant isolation:

create policy multitenant on my_table as restrictive using (tenant_id = current_tenant_id());
mubaidr commented 4 years ago

Yes, I mean multi-tenant ๐Ÿคฆ

But does this policy works with citus as well? In normal circumstance, we have to pass tenant_id for each (distributed table) query.

Reason I am curious is because citus has own manager (which distributes/executes query based on same tenant_id) and as it seems this policy thing is low level to citus workers/nodes (and manager might not know about this policy).

BTW I will try to find this thing on citus docs too.

benjie commented 4 years ago

Seems to be similar to what they recommend for multi-tenancy in their RLS docs: https://docs.citusdata.com/en/v9.3/admin_guide/cluster_management.html?highlight=row-level#row-level-security

mubaidr commented 4 years ago

@benjie Thanks for helping me out. Looks like this feature is only available in Enterprise version, time to ditch citus.

corporatepiyush commented 4 years ago

Citus supports many things. IMHO Ditching Citus is harsh decision. Try implementing simple trigger function to mimic row level security.

benjie commented 4 years ago

You can't really mimic RLS with a trigger... Maybe for mutations (at significant overhead), but not for reads. I'd question whether your project actually needs Citus; vanilla Postgres will get you a long way on today's hardware.

benjie commented 3 years ago

At the request of a client I've spent a bit of time looking over what's preventing Cockroach DB from being used with PostGraphile. I've compiled this document by reading through the Known Limitations in CockroachDB v20.2, and then by setting up a free cockroachdb cluster using https://cockroachlabs.cloud/ and attempting to run a (more and more) modified introspection query against it.

Show stopper: no correlated CTE support

CockroachDB does not support correlated CTEs; from the limitations page:

CockroachDB does not support correlated common table expressions. This means that a CTE cannot refer to a variable defined outside the scope of that CTE.

For example, the following query returns an error:

SELECT * FROM users
  WHERE id =
    (WITH rides_home AS
      (SELECT revenue FROM rides
       WHERE end_address = address)
     SELECT rider_id FROM rides_home);
ERROR: CTEs may not be correlated
SQLSTATE: 0A000

This query returns an error because the WITH rides_home clause references a column (address) returned by the SELECT statement at the top level of the query, outside the rides_home CTE definition.

Correlated CTEs are at the heart of how graphile-build-pg builds SQL statements using PostgreSQL's powerful JSON functionality; working around this would be a huge task and would have significant negative performance consequences.

Issues that PostGraphile could potentially work around

From attempting to run the introspection query I noticed the following limitations:

Other issues that you might face using CockroachDB with a PostGraphile project

COMMENT ON comments aren't backed up in database backups; suggest that you use smart tags file/plugins instead of smart comments:

The COMMENT ON statement associates comments to databases, tables, or columns. However, the internal table (system.comments) in which these comments are stored is not captured by a BACKUP of a table or database.

Since DDL and DML cannot be mixed safely, populating enum tables is a little harder (need to use separate transactions); in Graphile Migrate this means separate commits:

[Schema changes within transactions] DDL statements cannot be mixed with DML statements. As a workaround, you can split the statements into separate transactions. For more details, see examples of unsupported statements. [Schema changes within transactions] Schema change DDL statements inside a multi-statement transaction can fail while other statements succeed.

OR expressions are even more expensive than in PostgreSQL - always involving a table scan. This could make some RLS policies very expensive:

Given a query like SELECT * FROM foo WHERE a > 1 OR b > 2, even if there are appropriate indexes to satisfy both a > 1 and b > 2, the query planner performs a full table or index scan because it cannot use both conditions at once.

Untested

I didn't attempt to analyse logical decoding for live, event triggers for watch mode, or row level policies for security.

rafiss commented 3 years ago

Thanks @benjie! I will comment here with the tracking issues in CockroachDB for some of the items you listed:

benjie commented 3 years ago

Thanks @rafiss!

Here's a compiled version of the PostGraphile introspection query that can be ran against a PostgreSQL v11 DB via psql -Xf introspection.sql postgres://user:pass@host:port/dbname. It may or may not work on earlier (or later) versions of PostgreSQL.

https://gist.github.com/benjie/ae8393cb4059594fc9a68d9734bbfe79

hydrandt commented 3 years ago

Very happy to see this discussion alive, thank you for the information!

We are (a non profit = no budget quite yet) building an application with postgraphile that we need to make available both in China and rest of the world. Not really doable without running it in multiple geographical locations. Searching for a solution now. Postgres master-master replication tools, like Bucardo, don't seem very active and reliable.

I ran into one more postgres-compatible distributed database, yugabytedb. They even mention postgraphile in one of their blogs: https://blog.yugabyte.com/graphql-distributed-sql-tips-and-tricks-july-10-2020/

Any information about running postgraphile on yugabytedb?

benjie commented 3 years ago

Any information about running postgraphile on yugabytedb?

According to the article you posted it seems to work. Have you faced any issues with it?

xvaara commented 3 years ago

Just tested YugabyteDB quickly. Queries work, but on mutations:

2021-10-05 20:24:05.369 EEST [4902] ERROR:  SAVEPOINT <transaction> not supported yet at character 1
2021-10-05 20:24:05.369 EEST [4902] HINT:  See https://github.com/YugaByte/yugabyte-db/issues/1125. Click '+' on the description to raise its priority
2021-10-05 20:24:05.370 EEST [4902] ERROR:  ROLLBACK <transaction> not supported yet at character 1
2021-10-05 20:24:05.370 EEST [4902] HINT:  See https://github.com/YugaByte/yugabyte-db/issues/1125. Click '+' on the description to raise its priority

they are working on savepoints and rollbacks: https://github.com/yugabyte/yugabyte-db/issues/9219

xvaara commented 3 years ago

also on worker:

2021-10-05 19:58:37.123 EEST [3972] ERROR:  This ALTER TABLE command is not yet supported.
2021-10-05 19:58:37.123 EEST [3972] STATEMENT:  alter table "graphile_worker".jobs add column key text unique check(length(key) > 0);
apatrida commented 1 year ago

@mubaidr @benjie CITUS 11 eliminates the enterprise concerns:

Fully open source With Citus 11, the remaining Citus Enterprise features are now available as open source, including the non-blocking aspect of the shard rebalancer, tenant isolation, user management, fine-grained access controls, pg_dist_poolinfo, and performance optimizations for data loading.

mubaidr commented 1 year ago

@mubaidr @benjie CITUS 11 eliminates the enterprise concerns:

Fully open source With Citus 11, the remaining Citus Enterprise features are now available as open source, including the non-blocking aspect of the shard rebalancer, tenant isolation, user management, fine-grained access controls, pg_dist_poolinfo, and performance optimizations for data loading.

This is great news! ๐ŸŽˆ

jhg03a commented 1 year ago

Testing on Rocky Linux 8.8 using the latest YugabyteDB 2.18.0.1-b4 with the extra postgis extensions to both yugabyte and postgraphile, everything seemed to work just fine with one exception. LISTEN/NOTIFY isn't supported yet, so postgraphile schema updates on the fly won't work and nor will subscriptions. That's not a huge dealbreaker though for my use cases. Queries, Mutations (with custom functions too), and the JWT auth aspects all work from what I've checked. I wouldn't call it exhaustive testing, but enough I'm willing to not eliminate the possibility of using it outright.

You can use the stock postgres connectivity as always, but you can also use their yugabyte driver just fine as well since it's a fork of the normal pg monorepo that still provides pg.Pool objects for postgraphile.