npgsql / efcore.pg

Entity Framework Core provider for PostgreSQL
PostgreSQL License
1.52k stars 223 forks source link

postgres-xl support for 'DISTRIBUTE BY' #248

Open mahald opened 6 years ago

mahald commented 6 years ago

Hi,

Wonder wheter it's possible to use postgresql-xl with this ef driver. It would have to be possible to provide the 'DISTRIBUTE BY' then somehow on the the table.

CREATE TABLE disttab (col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1); \d+ disttab CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY REPLICATION; \d+ repltab

i assume it is not yet possible, would it be possible to add this as new Feature ?

roji commented 6 years ago

It's actually quite easy to add this capability - see #199 for a similar example if you want to submit a PR.

You can also easily add this to your migrations by manually editing the migration code and adding the DISTRIBUTE BY clause.

Jmorjsm commented 3 years ago

I'm interested in picking this up. @roji what would your thoughts be on generic naming for annotations in this? I ask as I've been looking for a similar solution for greenplum but this is obviously a feature in multiple distributed postgresql DBs.

roji commented 3 years ago

@Jmorjsm great. I'm not familiar with this feature (or postgres-xsl/greenplum), are you saying multiple distributed PostgreSQL DBs have the exact same command syntax, as a sort of standard? I'd be wary of assuming cross-database compatibility here - even if there's similarity between the different DBs, I'm guessing there are syntax differences as well. A comparison with some links to the different database docs would be useful.

Otherwise, and to be on the safe side, I'd go with an entity type builder extension such as PostgresXLDistributeBy, to make sure there's no confusion about this being applicable to regular PostgreSQL.

There's also the question of what DISTRIBUTE BY accepts as a parameter, and modeling that well. You can write up a quick proposal (nothing too formal), if you'd prefer to submit a PR directly that's fine too.

Jmorjsm commented 3 years ago

Turns out postgres-xl has a few more distribution options than Greenplum, with distribution styles and more distribution strategies. From the postgres-xl documentation:

[ 
  DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } |
  DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |
  DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name )
]

I have implemented all of the above in #1697. There is some overlap in the syntax supported by both postgres-xl and greenplum, namely when using DISTRIBUTED BY (column_name) or DISTRIBUTED RANDOMLY, but all other strategies have different syntax. Fortunately in my use case with Greenplum, this is sufficient. From the Greenplum documentation:

[ DISTRIBUTED BY (column [opclass], [ ... ] ) 
       | DISTRIBUTED RANDOMLY | DISTRIBUTED REPLICATED ]