Closed corporatepiyush closed 6 years ago
As far as I know they have not been tested. Compatibility mostly comes down to Postgres' introspection features - do the databases you mention provide this? You can check by running the introspection query:
If that works, there are a few other sticking points (such as CTE support and more) but that's the main things.
I'd love to hear definite results from someone who's tried it ๐
Yes Citus is pg extension but it has some restrictions. ref link: https://www.citusdata.com/faq
Do you support all PostgreSQL features and the SQL standard?
Since Citus provides distributed functionality by extending PostgreSQL, it uses the standard PostgreSQL SQL constructs. It provides full SQL support for queries which access a single node in the database cluster. These queries are common, for instance, in multi-tenant applications where different nodes store different tenants (see When to Use Citus).
Other queries which, by contrast, combine data from multiple nodes, do not support the entire spectrum of PostgreSQL features. However they still enjoy broad SQL coverage, including semi-structured data types (like jsonb, hstore), full text search, operators, functions, and foreign data wrappers. Note that the following constructs arenโt supported natively for cross-node queries:
Window Functions
CTEs
Set operations
Transactional semantics for queries that span across multiple shards
CockroachDB limitations ref link: https://www.cockroachlabs.com/docs/stable/known-limitations.html https://www.cockroachlabs.com/docs/stable/sql-grammar.html
PostGraphile uses CTEs heavily; we also use all those other features that don't have support from Citus. So I'm going to say: "No, we do not support Citus".
would it be hard to enable tiered support for these DBs? As in, a user could get basic functionality with Citus and CockroachDB, but can get full native Postgres support by enabling a flag in Postgraphile constructor parameter?
At the moment, the system is built targeting Postgres and leverages its features. We could scale it back so it were more basic (and portable) but we'd be sacrificing a lot - not to mention complicating development. I think with our limited development resources it's not worth considering right now.
Citus v7.2 released yesterday. It now supports CTE and bunch of other stuffs https://www.citusdata.com/blog/2018/01/26/citus-7-2-ctes-complex-subqueries-set-operations
Do you have test suite which can be run against it ?
Yes; instructions are in the graphile-build README:
https://github.com/graphile/graphile-build#development
Though thinking about it, they seem to exclude the instruction to create a test database and export it's URL:
createdb graphile_test
export TEST_DATABASE_URL="postgres://localhost/graphile_test"
lerna run test
This is the result I got for Citus v7.2
lerna info version 2.8.0
yarn run v1.3.2
$ true
Done in 0.07s.
yarn run v1.3.2
$ jest .
No tests found
In /home/piyush/work/github/graphile-build/packages/graphql-parse-resolve-info
4 files checked.
testMatch: - 4 matches
testPathIgnorePatterns: /node_modules/ - 4 matches
testRegex: __tests__/.*\.test\.js$ - 0 matches
Pattern: "." - 0 matches
Done in 0.40s.
yarn run v1.3.2
$ jest
Done in 1.02s.
yarn run v1.3.2
$ jest
Done in 0.75s.
yarn run v1.3.2
$ scripts/test
Database reset successfully โ
Done in 4.43s.
lerna success run Ran npm script 'test' in packages:
lerna success - graphile-build-pg
lerna success - graphile-build
lerna success - graphile
lerna success - graphql-parse-resolve-info
lerna success - postgraphile-core
Installation of Citus v7.2 was straight forward. I had to remove existing postgres installation on my Ubuntu machine and then follow download instruction here
Very promising!
Hey @corporatepiyush Have you been using Citus with PostGraphile? Any issues?
[semi-automated message] We try and keep the open issues to actual issues (bugs, etc); this seems like more of a discussion right now, so I'm closing it but please feel free to keep discussing it below ๐
Since a major use of citus
is to develop a multi-vendor database using sharding, how do we manage multi-vendor queries in postgraphile
?
Do you mean multi-tenant, or is multi-vendor something else? Postgres supports partitioning, and you can easily add a restrictive RLS policy to each table to enforce tenant isolation:
create policy multitenant on my_table as restrictive using (tenant_id = current_tenant_id());
Yes, I mean multi-tenant
๐คฆ
But does this policy works with citus
as well? In normal circumstance, we have to pass tenant_id
for each (distributed table) query.
Reason I am curious is because citus
has own manager (which distributes/executes query based on same tenant_id
) and as it seems this policy thing is low level to citus
workers/nodes (and manager might not know about this policy).
BTW I will try to find this thing on citus
docs too.
Seems to be similar to what they recommend for multi-tenancy in their RLS docs: https://docs.citusdata.com/en/v9.3/admin_guide/cluster_management.html?highlight=row-level#row-level-security
@benjie Thanks for helping me out. Looks like this feature is only available in Enterprise version, time to ditch citus
.
Citus supports many things. IMHO Ditching Citus is harsh decision. Try implementing simple trigger function to mimic row level security.
You can't really mimic RLS with a trigger... Maybe for mutations (at significant overhead), but not for reads. I'd question whether your project actually needs Citus; vanilla Postgres will get you a long way on today's hardware.
At the request of a client I've spent a bit of time looking over what's preventing Cockroach DB from being used with PostGraphile. I've compiled this document by reading through the Known Limitations in CockroachDB v20.2, and then by setting up a free cockroachdb cluster using https://cockroachlabs.cloud/ and attempting to run a (more and more) modified introspection query against it.
CockroachDB does not support correlated CTEs; from the limitations page:
CockroachDB does not support correlated common table expressions. This means that a CTE cannot refer to a variable defined outside the scope of that CTE.
For example, the following query returns an error:
SELECT * FROM users WHERE id = (WITH rides_home AS (SELECT revenue FROM rides WHERE end_address = address) SELECT rider_id FROM rides_home);
ERROR: CTEs may not be correlated SQLSTATE: 0A000
This query returns an error because the WITH rides_home clause references a column (address) returned by the SELECT statement at the top level of the query, outside the rides_home CTE definition.
Correlated CTEs are at the heart of how graphile-build-pg builds SQL statements using PostgreSQL's powerful JSON functionality; working around this would be a huge task and would have significant negative performance consequences.
From attempting to run the introspection query I noticed the following limitations:
operator
keywords as in the a operator(pg_catalog.<@) b
syntax is not supported; we can use a <@ b
but this allows for ambiguity of operators which causes issues for some databases depending on what extensions are installed (I think intarray
causes issues with the introspection query, for example). We could solve this with cockroachdb-specific introspection query.pg_catalog.pg_relation_is_updatable
is not supported; I solved this by just always returning true, but this is undesirable (particularly for views).select row_to_json(x) from my_table x
is not supported, cockroachdb assumes x
is a column rather than a table reference. We can use select row_to_json(x.*) from my_table x
but it's a big lift to do this across the PostGraphile codebase.COMMENT ON
comments aren't backed up in database backups; suggest that you use smart tags file/plugins instead of smart comments:
The COMMENT ON statement associates comments to databases, tables, or columns. However, the internal table (system.comments) in which these comments are stored is not captured by a BACKUP of a table or database.
Since DDL and DML cannot be mixed safely, populating enum tables is a little harder (need to use separate transactions); in Graphile Migrate this means separate commits:
[Schema changes within transactions] DDL statements cannot be mixed with DML statements. As a workaround, you can split the statements into separate transactions. For more details, see examples of unsupported statements. [Schema changes within transactions] Schema change DDL statements inside a multi-statement transaction can fail while other statements succeed.
OR
expressions are even more expensive than in PostgreSQL - always involving a table scan. This could make some RLS policies very expensive:
Given a query like SELECT * FROM foo WHERE a > 1 OR b > 2, even if there are appropriate indexes to satisfy both a > 1 and b > 2, the query planner performs a full table or index scan because it cannot use both conditions at once.
I didn't attempt to analyse logical decoding for live, event triggers for watch mode, or row level policies for security.
Thanks @benjie! I will comment here with the tracking issues in CockroachDB for some of the items you listed:
row_to_json(x)
vs row_to_json(x.*)
: https://github.com/cockroachdb/cockroach/issues/60549Thanks @rafiss!
Here's a compiled version of the PostGraphile introspection query that can be ran against a PostgreSQL v11 DB via psql -Xf introspection.sql postgres://user:pass@host:port/dbname
. It may or may not work on earlier (or later) versions of PostgreSQL.
https://gist.github.com/benjie/ae8393cb4059594fc9a68d9734bbfe79
Very happy to see this discussion alive, thank you for the information!
We are (a non profit = no budget quite yet) building an application with postgraphile that we need to make available both in China and rest of the world. Not really doable without running it in multiple geographical locations. Searching for a solution now. Postgres master-master replication tools, like Bucardo, don't seem very active and reliable.
I ran into one more postgres-compatible distributed database, yugabytedb. They even mention postgraphile in one of their blogs: https://blog.yugabyte.com/graphql-distributed-sql-tips-and-tricks-july-10-2020/
Any information about running postgraphile on yugabytedb?
Any information about running postgraphile on yugabytedb?
According to the article you posted it seems to work. Have you faced any issues with it?
Just tested YugabyteDB quickly. Queries work, but on mutations:
2021-10-05 20:24:05.369 EEST [4902] ERROR: SAVEPOINT <transaction> not supported yet at character 1
2021-10-05 20:24:05.369 EEST [4902] HINT: See https://github.com/YugaByte/yugabyte-db/issues/1125. Click '+' on the description to raise its priority
2021-10-05 20:24:05.370 EEST [4902] ERROR: ROLLBACK <transaction> not supported yet at character 1
2021-10-05 20:24:05.370 EEST [4902] HINT: See https://github.com/YugaByte/yugabyte-db/issues/1125. Click '+' on the description to raise its priority
they are working on savepoints and rollbacks: https://github.com/yugabyte/yugabyte-db/issues/9219
also on worker:
2021-10-05 19:58:37.123 EEST [3972] ERROR: This ALTER TABLE command is not yet supported.
2021-10-05 19:58:37.123 EEST [3972] STATEMENT: alter table "graphile_worker".jobs add column key text unique check(length(key) > 0);
@mubaidr @benjie CITUS 11 eliminates the enterprise concerns:
Fully open source With Citus 11, the remaining Citus Enterprise features are now available as open source, including the non-blocking aspect of the shard rebalancer, tenant isolation, user management, fine-grained access controls, pg_dist_poolinfo, and performance optimizations for data loading.
@mubaidr @benjie CITUS 11 eliminates the enterprise concerns:
Fully open source With Citus 11, the remaining Citus Enterprise features are now available as open source, including the non-blocking aspect of the shard rebalancer, tenant isolation, user management, fine-grained access controls, pg_dist_poolinfo, and performance optimizations for data loading.
This is great news! ๐
Testing on Rocky Linux 8.8 using the latest YugabyteDB 2.18.0.1-b4 with the extra postgis extensions to both yugabyte and postgraphile, everything seemed to work just fine with one exception. LISTEN/NOTIFY isn't supported yet, so postgraphile schema updates on the fly won't work and nor will subscriptions. That's not a huge dealbreaker though for my use cases. Queries, Mutations (with custom functions too), and the JWT auth aspects all work from what I've checked. I wouldn't call it exhaustive testing, but enough I'm willing to not eliminate the possibility of using it outright.
You can use the stock postgres connectivity as always, but you can also use their yugabyte driver just fine as well since it's a fork of the normal pg monorepo that still provides pg.Pool objects for postgraphile.
Since Citus DB and Cockroach DB use Postgres compatible wire protocol and truly distributed sharding and transactions it would be cool if this library works with them. Is this library has been tested against these Databases ?