Multitenant datasources

gavinking commented 4 years ago

Currently the Hibernate extension features some integration with Hibernate multitenancy. Multitenancy in Hibernate essentially boils down to letting the program set a tenant id in a certain context, and having the datasource used by Hibernate selected automatically on the basis of that id. Where "automatically" means you the user can write code to select it.

It seems to me that this concern would be better handled in Quarkus at the level of the Quarkus datasource itself. There's no particular advantage to doing this in Hibernate, other than before this we had control over Hibernate but not over the environment in which Hibernate operated. And there are disadvantages, primarily that non-Hibernate code can't take advantage of the feature.

So I propose that we provide a way to map a contextual tenant id to a schema name, catalog name, or database URI, as a feature of the Quarkus datasource, and remove this feature from the Hibernate extension.

Thoughts?

sberyozkin commented 4 years ago

@gavinking Would it make sense to link OIDC Multi-tenancy with Hibrernate Multi-tenancy closer ? There was a blog post awhile back.

OIDC extension runs first, and when its mult-tenancy is activated, a tenant id is available, can be saved in RoutingContext if needed. Then Hibernate extension would pick it up and use it to choose its own tenant configuration.

What I'm not sure about is if it makes sense to use the same tenant id to 1) select an OIDC configuration (how to verify the token etc) and then 2) to use the same id to to select the configuration to read the data. Perhaps these are conceptually different activities.

@DorianMaliszewski, can you comment please ?

DorianMaliszewski commented 4 years ago

Hi,

Maybe this feature can be implemented by adding a filter after OIDC tenant resolver and in application.properties just quarkus.hibernate.use-oidc-tenant-resolver=true or something close.

It can be a good thing to implement an easy way to link hibernate to oidc tenant resolver. :+1:

gavinking commented 4 years ago

@sberyozkin yes, sure, it sounds like it would be great to integrate with that stuff as a source of tenant ids, though my main concern here in this particular issue is that I want it to be the datasource layer, not Hibernate itself, which is responsible for interpreting the tenant ids.

But yes, it sounds like a good idea.

sberyozkin commented 4 years ago

Hi @DorianMaliszewski thanks, yes, I was thinking of getting this ID automatically, without even having to do something, OIDC would set the ID in the the request context and it will be picked up if other Hibernate or lower level resolver is not available.

sberyozkin commented 4 years ago

Hi @gavinking thanks; sorry, I think I'd create a dedicated issue to discuss linking OIDC and Hibernate/datasource tenants, so as to not to overload this issue :-)

gavinking commented 4 years ago

Hi @gavinking thanks; sorry, I think I'd create a dedicated issue to discuss linking OIDC and Hibernate/datasource tenants, so as to not to overload this issue :-)

Don't apologize, it's a great idea; please open the issue!

michael-schnell commented 4 years ago

It seems to me that this concern would be better handled in Quarkus at the level of the Quarkus datasource itself. There's no particular advantage to doing this in Hibernate, other than before this we had control over Hibernate but not over the environment in which Hibernate operated. And there are disadvantages, primarily that non-Hibernate code can't take advantage of the feature.

At the time the Multitenancy feature was implemented, Hibernate ORM extension was the only place to integrate this in a short amount of time. Unfortunately Hibernate Multitenancy itself isn't really finalized since years. As stated in the manual "The JPA expert group is in the process of defining multitenancy support for an upcoming version of the specification". So this was a best effort approach to rely on what Hibernate offered, even if it is not much.

In general I agree, that the current solution is mainly a way to select the right datasource that is used for the current tenant.

michael-schnell commented 4 years ago

Hi,

Maybe this feature can be implemented by adding a filter after OIDC tenant resolver and in application.properties just quarkus.hibernate.use-oidc-tenant-resolver=true or something close.

It can be a good thing to implement an easy way to link hibernate to oidc tenant resolver. 👍

In general the mechanism for selecting the tenant is pretty much the same in the Hibernate ORM Multitenancy and the ODIC extension. It is very easy to integrate both parts using CDI.

gavinking commented 4 years ago

So I've done a bit more research into how this stuff has been done in Quarkus, and here's a more concrete proposal to kick about.

Introduce a parent/child relationship between "base" datasources and "tenant" datasources.
Tenant datasources inherit the configuration of the base datasource, except for any setting they explicitly override.
The base datasource uses the current tenant identifier (if it is set) to forward connection requests to the corresponding tenant datasource.
Remove the Hibernate-specific code.

Now, configuring a multitenant datasource in Quarkus would look like this:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.jdbc.url=jdbc:postgresql://localhost:5433/mycompany

Or:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.username=mycompany
quarkus.datasource.mycompany.password=secret

The advantages of this approach include:

Multitenancy is available as a feature to all Quarkus code, not just Hibernate.
Repetitive datasource configuration is minimized.
Things other than just the catalog and schema name can be configured per-tenant (e.g. the user id above).

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties. However, I must admit I'm not certain about this at all, and so perhaps there's some reason to have this as a whole different strategy.

gavinking commented 4 years ago

@Sanne I'm still waiting to hear your take on all this.

gavinking commented 4 years ago

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties.

Well, even if there is a good reason to use setSchema() instead of the JDBC connection properties, we can still abstract that as a property of the datasource via .schema (and for completeness .catalog) properties:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.schema=mycompany

michael-schnell commented 4 years ago

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties.

Well, even if there is a good reason to use setSchema() instead of the JDBC connection properties, we can still abstract that as a property of the datasource via .schema (and for completeness .catalog) properties:
# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.schema=mycompany

If I remember it correctly, "setSchema(..)" didn't work either on MariaDB or on PostgreSQL. Some databases like MariaDB/MySQL do not support database schemas. In these cases you have to use the DATABASE approach.

michael-schnell commented 4 years ago

So I've done a bit more research into how this stuff has been done in Quarkus, and here's a more concrete proposal to kick about.

Introduce a parent/child relationship between "base" datasources and "tenant" datasources.

Tenant datasources inherit the configuration of the base datasource, except for any setting they explicitly override.

The base datasource uses the current tenant identifier (if it is set) to forward connection requests to the corresponding tenant datasource.

Remove the Hibernate-specific code.

Now, configuring a multitenant datasource in Quarkus would look like this:
# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.jdbc.url=jdbc:postgresql://localhost:5433/mycompany
Or:
# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.username=mycompany
quarkus.datasource.mycompany.password=secret
The advantages of this approach include:

Multitenancy is available as a feature to all Quarkus code, not just Hibernate.

Repetitive datasource configuration is minimized.

Things other than just the catalog and schema name can be configured per-tenant (e.g. the user id above).

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties. However, I must admit I'm not certain about this at all, and so perhaps there's some reason to have this as a whole different strategy.

Sounds like a good approach, but seems mainly to optimize the configuration setup.

How do you tell Hibernate which is the currently selected tenant connection/datasource to use? Most likely this is connected to a request scope.

gavinking commented 4 years ago

@michael-schnell

How do you tell Hibernate which is the currently selected tenant connection/datasource to use?

Well you don't need to. That's the whole idea. You just point Hibernate at the base datasource which takes care of redirecting the connection requests to the correct contextual tenant datasource, just as it would do for any other client.

Hibernate multitenancy doesn't need to be involved in this at all.

gavinking commented 4 years ago

@michael-schnell

If I remember it correctly, "setSchema(..)" didn't work either on MariaDB or on PostgreSQL.

Eyeballing the code of the Postgres JDBC driver, I believe Postgres does support setSchema(), but no, of course, MySQL (and derivative MariaDB) don't have schemas, so they don't support it.

venki157 commented 4 years ago

Anything concluded on this multitenency issue. When can we expect this fix?

DorianMaliszewski commented 4 years ago

Refer to this issue please : https://github.com/quarkusio/quarkus/issues/11949

michael-schnell commented 4 years ago

I would love to do it, but at the moment I have no time. Hopefully there is some spare time during the Christmas holidays. As suggested by Gavin the idea is to move the functionality out of Hibernate extension into the datasources extension. Definitely I will also need support from the datasources maintainers for this task.

ltsallas commented 2 years ago

Other than the shared database, different schema multitenancy, there can be a shared database/shared schema approach with Postgres Row Level Security and a session parameter as discussed in the "Alternative approach" section in this article

I made a question on stackoverflow asking for a Quarkus solution for this case - I didn't find a straight forward way to do so. I am just sharing it here since the - outside hibernate- multitenant datasources solution you are discussing so far might take into account the shared db & schema multitenancy as well

quarkusio / quarkus

Multitenant datasources #11861