Open bajtos opened 4 years ago
I don't have bandwidth to look into this topic in full, but would like to dump few ideas & pointers to make it easier for others to do the research.
Conceptually, querying a database table involves the following actors:
In a typical LB4 application, each model is associated with the same Repository class and the same DataSource instance - the wiring is static.
To enable multi-tenancy, we want to make this wiring dynamic. Depending on the current user, we want to use a different repository and/or datasource configuration.
In this setup, the authentication layer and all tenants share the same database name and use the same credentials (database user) to access the data. We have 1+N schemas defined in the database: the first schema is used by the authentication layer, plus we have one schema for each tenant. All database queries will use the same LB datasource and thus share the same connection pool.
Implementation wise, we need to tweak the way how a LB4 model is registered with a datasource. Instead of creating the same backing juggler model for all users, we want to create tenant-specific juggler models.
Conceptually, this can be accomplished by tweaking the Repository constructor.
export class ProductRepository extends DefaultCrudRepository<
Product,
typeof Product.prototype.id
> {
constructor(
@inject('datasources.db') dataSource: juggler.DataSource,
@inject(SecurityBindings.USER) currentUser: UserProfile,
) {
super(
// model constructor
Product,
// datasource to use
dataSource,
// new feature to be implemented in @loopback/repository:
// allow repository users to overwrite model settings
{schema: currentUser.name},
);
}
}
If schema-based isolation is not good enough (or not supported by the target database), or if we don't want tenants to share the same database connection pool, then we can wire our application to use a different datasource for each tenant. This approach unlocks new option for tenant isolation, for example it's possible to use different database & credentials for each tenant.
LB4 applications are already using Dependency Injection to obtain the datasource instance to be provided to Repository constructors. By default, a datasource is bound in a static way and configured to a singleton, see app.dataSource()
.
To support multi-tenancy, we need to rework the resolution of datasources to be dynamic, based on the current user.
Let's start from outside. To make it easy to inject the tenant-specific datasource, let's keep the same datasource name (binding key), e.g. datasources.tenantData
, but implement dynamic resolution of the datasource value. The idea is to rework the datasource class scaffolded by lb4 datasource
into a Provider class.
import {inject} from '@loopback/core';
import {juggler} from '@loopback/repository';
const config = {
name: 'tenantData',
connector: 'postgresql',
// ...
};
export class TenantDataSourceProvider implements Provider<TenantDataSource > {
constructor(
@inject('datasources.config.tenant', {optional: true})
private dsConfig: object = config,
@inject(SecurityBindings.USER)
private currentUser: UserProfile,
) {}
value() {
const config = {
...this.dsConfig,
// apply tenant-specific settings
schema: this.currentUser.name
};
// Because we are using the same binding key for multiple datasource instances,
// we need to implement our own caching behavior to support SINGLETON scope
// I am leaving this aspect as something to figure out as part of the research
const cached = // look up existing DS instance
if (cached) return cached;
const ds = new TenantDataSource(config);
// store the instance in the cache
return ds;
}
}
export class TenantDataSource extends juggler.DataSource {
static dataSourceName = 'tenant';
// constructor is not needed, we can use the inherited one.
// start/stop methods are needed, I am skipping them for brevity
}
There are different ways how to implement caching of per-tenant datasources. Ideally, I would like to reuse Context for that. It turns out this is pretty simple!
We want each tenant datasource to have its own datasource name and binding key. To allow repositories to obtain the datasource via @inject
, we can implement a "proxy" datasource provider that will be resolved using one of the name datasources.
export class TenantDataSourceProvider implements Provider<TenantDataSource> {
private dataSourceName: string;
private bindingKey: string;
constructor(
@inject('datasources.config.tenant', {optional: true})
private dsConfig: object = config,
@inject(SecurityBindings.USER)
private currentUser: UserProfile,
@inject.context()
private currentContext: Context,
@inject(CoreBindings.APPLICATION_INSTANCE)
private app: Application,
) {
this.dataSourceName = `tenant-${this.currentUser.name}`;
this.bindingKey = `datasources.${this.dataSourceName}`;
}
value() {
if (!this.currentContext.isBound(this.bindingKey)) {
this.setupDataSource();
}
return this.currentContext.get<juggler.DataSource>(this.bindingKey);
}
private setupDataSource() {
const resolvedConfig = {
...this.dsConfig,
// apply tenant-specific settings
schema: this.currentUser.name,
};
const ds = new TenantDataSource(resolvedConfig);
// Important! We need to bind the datasource to the root (application-level)
// context to reuse the same datasource instance for all requests.
this.app.bind(this.bindingKey).to(ds).tag({
name: this.dataSourceName,
type: 'datasource',
namespace: 'datasources',
});
}
}
export class TenantDataSource extends juggler.DataSource {
// no static members like `dataSourceName`
// constructor is not needed, we can use the inherited one.
// start/stop methods are needed, I am skipping them for brevity
}
The code example above creates per-tenant datasource automatically when the first request is made by each tenant. This should provide faster app startup and possibly less pressure on the database in the situation when most tenants connect use the app only infrequently. On the other hand, any problems with a tenant-specific database connection will be discovered only after the first request was made, which may be too late. If you prefer to establish (and check) all tenant database connection right at startup, you can move the code from setupDataSource
to a boot script and invoke it for each known tenant.
Open questions:
@loopback/boot
to treat it as a regular datasource file, because we want to bind the provider, not the datasource class.lb4 repository
(and other) commands to recognize our special datasource and include it in the list of known datasources?A possible solution is to enhance @loopback/boot
to support datasource providers (e.g. using the convention datasources/{name}.datasource.provider.ts
) and also enhance @loopback/cli
to recognize these providers too.
@bajtos , thank you for starting this discussion. Great start.
Few more comments on the examples provided in my previous comment:
I assumed that the name of the current user is the tenant id. In a real app, we will need to map users to tenants first.
In the first example, where I am passing custom model settings to base repository constructors, we will need to include a unique model name to use, in addition to custom schema. Otherwise all tenants would share the same backing model.
const tenant = currentUser.name; // for simplicity
super(
// model constructor
Product,
// datasource to use
dataSource,
// new feature to be implemented in @loopback/repository:
// allow repository users to overwrite model settings
{name: `Product_${tenant}`, schema: tenant},
);
I see a few tiers/components to enforce multi-tenancy.
Identify the tenant id for a request
Locate tenant specific resources, such as datasources for databases and services
Bind tenant specific resources to the request context
Database multi-tenancy
tenantId
(for searching, for sharding etc)I don't have bandwidth to look into this topic in full, but would like to dump few ideas & pointers to make it easier for others to do the research.
Conceptually, querying a database table involves the following actors:
1. Model describes the shape of the data (table columns) and metadata (table name, DDL schema name, etc.) 2. Repository binds a model with a datasource, registers the model with the datasource. At the moment, this involves converting LB4 model definition to LB3/juggler style. 3. Datasource represents a database client and maintains a pool of open database connections. The datasource is configured with a connector to use (PostgreSQL, MongoDB, etc.), connection settings (host/port, credentials) and most importantly the database name to use. Most (if not all) LoopBack connectors require each datasource to use only one database name, it's not possible to switch between databases at runtime. 4. The database server.
In a typical LB4 application, each model is associated with the same Repository class and the same DataSource instance - the wiring is static.
To enable multi-tenancy, we want to make this wiring dynamic. Depending on the current user, we want to use a different repository and/or datasource configuration.
Lightweight tenant isolation using schemas
In this setup, the authentication layer and all tenants share the same database name and use the same credentials (database user) to access the data. We have 1+N schemas defined in the database: the first schema is used by the authentication layer, plus we have one schema for each tenant. All database queries will use the same LB datasource and thus share the same connection pool.
Implementation wise, we need to tweak the way how a LB4 model is registered with a datasource. Instead of creating the same backing juggler model for all users, we want to create tenant-specific juggler models.
Conceptually, this can be accomplished by tweaking the Repository constructor.
export class ProductRepository extends DefaultCrudRepository< Product, typeof Product.prototype.id > { constructor( @inject('datasources.db') dataSource: juggler.DataSource, @inject(SecurityBindings.USER) currentUser: UserProfile, ) { super( // model constructor Product, // datasource to use dataSource, // new feature to be implemented in @loopback/repository: // allow repository users to overwrite model settings {schema: currentUser.name}, ); } }
Datasource-based tenant isolation
If schema-based isolation is not good enough (or not supported by the target database), or if we don't want tenants to share the same database connection pool, then we can wire our application to use a different datasource for each tenant. This approach unlocks new option for tenant isolation, for example it's possible to use different database & credentials for each tenant.
LB4 applications are already using Dependency Injection to obtain the datasource instance to be provided to Repository constructors. By default, a datasource is bound in a static way and configured to a singleton, see
app.dataSource()
.To support multi-tenancy, we need to rework the resolution of datasources to be dynamic, based on the current user.
Let's start from outside. To make it easy to inject the tenant-specific datasource, let's keep the same datasource name (binding key), e.g.
datasources.tenantData
, but implement dynamic resolution of the datasource value. The idea is to rework the datasource class scaffolded bylb4 datasource
into a Provider class.import {inject} from '@loopback/core'; import {juggler} from '@loopback/repository'; const config = { name: 'tenantData', connector: 'postgresql', // ... }; export class TenantDataSourceProvider implements Provider<TenantDataSource > { constructor( @inject('datasources.config.tenant', {optional: true}) private dsConfig: object = config, @inject(SecurityBindings.USER) private currentUser: UserProfile, ) {} value() { const config = { ...this.dsConfig, // apply tenant-specific settings schema: this.currentUser.name }; // Because we are using the same binding key for multiple datasource instances, // we need to implement our own caching behavior to support SINGLETON scope // I am leaving this aspect as something to figure out as part of the research const cached = // look up existing DS instance if (cached) return cached; const ds = new TenantDataSource(config); // store the instance in the cache return ds; } } export class TenantDataSource extends juggler.DataSource { static dataSourceName = 'tenant'; // constructor is not needed, we can use the inherited one. // start/stop methods are needed, I am skipping them for brevity }
There are different ways how to implement caching of per-tenant datasources. Ideally, I would like to reuse Context for that. It turns out this is pretty simple!
We want each tenant datasource to have its own datasource name and binding key. To allow repositories to obtain the datasource via
@inject
, we can implement a "proxy" datasource provider that will be resolved using one of the name datasources.export class TenantDataSourceProvider implements Provider<TenantDataSource> { private dataSourceName: string; private bindingKey: string; constructor( @inject('datasources.config.tenant', {optional: true}) private dsConfig: object = config, @inject(SecurityBindings.USER) private currentUser: UserProfile, @inject.context() private currentContext: Context, @inject(CoreBindings.APPLICATION_INSTANCE) private app: Application, ) { this.dataSourceName = `tenant-${this.currentUser.name}`; this.bindingKey = `datasources.${this.dataSourceName}`; } value() { if (!this.currentContext.isBound(this.bindingKey)) { this.setupDataSource(); } return this.currentContext.get<juggler.DataSource>(this.bindingKey); } private setupDataSource() { const resolvedConfig = { ...this.dsConfig, // apply tenant-specific settings schema: this.currentUser.name, }; const ds = new TenantDataSource(resolvedConfig); // Important! We need to bind the datasource to the root (application-level) // context to reuse the same datasource instance for all requests. this.app.bind(this.bindingKey).to(ds).tag({ name: this.dataSourceName, type: 'datasource', namespace: 'datasources', }); } } export class TenantDataSource extends juggler.DataSource { // no static members like `dataSourceName` // constructor is not needed, we can use the inherited one. // start/stop methods are needed, I am skipping them for brevity }
The code example above creates per-tenant datasource automatically when the first request is made by each tenant. This should provide faster app startup and possibly less pressure on the database in the situation when most tenants connect use the app only infrequently. On the other hand, any problems with a tenant-specific database connection will be discovered only after the first request was made, which may be too late. If you prefer to establish (and check) all tenant database connection right at startup, you can move the code from
setupDataSource
to a boot script and invoke it for each known tenant.Open questions:
* Where to put this new provider file and how to register the provider at boot time? We don't want `@loopback/boot` to treat it as a regular datasource file, because we want to bind the provider, not the datasource class. * How to enable `lb4 repository` (and other) commands to recognize our special datasource and include it in the list of known datasources?
A possible solution is to enhance
@loopback/boot
to support datasource providers (e.g. using the conventiondatasources/{name}.datasource.provider.ts
) and also enhance@loopback/cli
to recognize these providers too.
@bajtos Thanks for responding to my DM on twitter and starting this conversion. Mind sharing how these data sources are injected into repositories since this.bindingKey
s are being dynamically generated? I'm talking about the Datasource-based tenant isolation option. Thanks.
FYI: I just built an example application to illustrate multi-tenancy for LoopBack 4 - https://github.com/strongloop/loopback-next/pull/5087.
@King-Success
Mind sharing how these data sources are injected into repositories since
this.bindingKeys
are being dynamically generated?
IIUC, you are interested in Datasource-based tenant isolation.
The idea is to bind a static datasource key to TenantDataSourceProvider
, which will resolve to one of the dynamically-created datasources.
For example, in the app constructor:
this.bind('datasources.tenant').toProvider(TenantDataSourceProvider);
Then you can inject the datasource the usual way, for example:
@inject('datasources.tenant')
dataSource: TenantDataSource
@raymondfeng
I see a few tiers/components to enforce multi-tenancy.
Thank you for chiming in and adding wider perspective to this discussion :+1: ❤️
hello, Please, let me bring one situation that happened here. About a year ago I started a lb4 project with 1 mainDatasource + 1 TenantDatasource with 9 tenant schemas related on mysql database. (very similar to Lightweight @bajtos idea) As a newbie I dont have skills to create nice lb4 providers-actions-etc, I simply added this code to sequence.ts:
if (authUser && authUser.tenantId) {
console.log("User: ", authUser.username, " Tenant =>", authUser.defaultTenant, " url:", request.url, "body: ", request.body);
await this.someRepository.execute('use tenant_?', [authUser.tenantId]);
} else {
console.log("****** NO AUTHUSER ****** url: ", request.url, " request.headers: ", request.headers);
}
In the beginning doing some GET/POSTs tests, I started to receive/save data from/to different tenant schemas instead of user.tenantId schema. After debugs, I could see that its related to mysql.connectionLimit size. It means that at same api request, one connection was used to "use tenant_x" and another to find/create,... Changing connectionLimit to 1 seams to resolve.... but sometimes, still not effective.
After a year, with millions of records saved to database, most of them should be on same tenant_5 schema, I still have some records (kind of 1-2k records) that was not saved on correct schema, creating some "noise" issues.
My doubt, do you think that @raymondfeng example solution is "bullet proof" about this connection pool issue? Using diferent tenantDatasources, can I improve connectionLimit to 5?
best regards and thanks for lb4!
@fredvhansen Your solution is problematic.
use tenant_x
only configures one connection from the connection pool maintained by mysql connector.use tenant_x
due to the async nature.My example multi-tenancy application has completely isolated datasources for each tenant. The action in the sequence/interceptor can enforce the tenancy by setting different bindings to control what datasources to be used.
If overhead is a concern, there is a possible solution for pooling datasources - see https://github.com/strongloop/loopback-next/pull/5681
@raymondfeng , Thank you for your prompt answer! (and yes, I know its problematic with async since the beginning.. time to time I was googling new topics) Already converted/merged my app with your example this weekend and working well on dev env.
personal doubt: I don't understand how
.bind('datasources.db')
.toAlias(`datasources.db.${tenant.id}`); (where id=1)
binds to Db1DataSource with "datasources.config.db1"
But this works! Thanks again!
I don't have bandwidth to look into this topic in full, but would like to dump few ideas & pointers to make it easier for others to do the research.
Conceptually, querying a database table involves the following actors:
- Model describes the shape of the data (table columns) and metadata (table name, DDL schema name, etc.)
- Repository binds a model with a datasource, registers the model with the datasource. At the moment, this involves converting LB4 model definition to LB3/juggler style.
- Datasource represents a database client and maintains a pool of open database connections. The datasource is configured with a connector to use (PostgreSQL, MongoDB, etc.), connection settings (host/port, credentials) and most importantly the database name to use. Most (if not all) LoopBack connectors require each datasource to use only one database name, it's not possible to switch between databases at runtime.
- The database server.
In a typical LB4 application, each model is associated with the same Repository class and the same DataSource instance - the wiring is static.
To enable multi-tenancy, we want to make this wiring dynamic. Depending on the current user, we want to use a different repository and/or datasource configuration.
Lightweight tenant isolation using schemas
In this setup, the authentication layer and all tenants share the same database name and use the same credentials (database user) to access the data. We have 1+N schemas defined in the database: the first schema is used by the authentication layer, plus we have one schema for each tenant. All database queries will use the same LB datasource and thus share the same connection pool.
Implementation wise, we need to tweak the way how a LB4 model is registered with a datasource. Instead of creating the same backing juggler model for all users, we want to create tenant-specific juggler models.
Conceptually, this can be accomplished by tweaking the Repository constructor.
export class ProductRepository extends DefaultCrudRepository< Product, typeof Product.prototype.id > { constructor( @inject('datasources.db') dataSource: juggler.DataSource, @inject(SecurityBindings.USER) currentUser: UserProfile, ) { super( // model constructor Product, // datasource to use dataSource, // new feature to be implemented in @loopback/repository: // allow repository users to overwrite model settings {schema: currentUser.name}, ); } }
Datasource-based tenant isolation
If schema-based isolation is not good enough (or not supported by the target database), or if we don't want tenants to share the same database connection pool, then we can wire our application to use a different datasource for each tenant. This approach unlocks new option for tenant isolation, for example it's possible to use different database & credentials for each tenant.
LB4 applications are already using Dependency Injection to obtain the datasource instance to be provided to Repository constructors. By default, a datasource is bound in a static way and configured to a singleton, see
app.dataSource()
.To support multi-tenancy, we need to rework the resolution of datasources to be dynamic, based on the current user.
Let's start from outside. To make it easy to inject the tenant-specific datasource, let's keep the same datasource name (binding key), e.g.
datasources.tenantData
, but implement dynamic resolution of the datasource value. The idea is to rework the datasource class scaffolded bylb4 datasource
into a Provider class.import {inject} from '@loopback/core'; import {juggler} from '@loopback/repository'; const config = { name: 'tenantData', connector: 'postgresql', // ... }; export class TenantDataSourceProvider implements Provider<TenantDataSource > { constructor( @inject('datasources.config.tenant', {optional: true}) private dsConfig: object = config, @inject(SecurityBindings.USER) private currentUser: UserProfile, ) {} value() { const config = { ...this.dsConfig, // apply tenant-specific settings schema: this.currentUser.name }; // Because we are using the same binding key for multiple datasource instances, // we need to implement our own caching behavior to support SINGLETON scope // I am leaving this aspect as something to figure out as part of the research const cached = // look up existing DS instance if (cached) return cached; const ds = new TenantDataSource(config); // store the instance in the cache return ds; } } export class TenantDataSource extends juggler.DataSource { static dataSourceName = 'tenant'; // constructor is not needed, we can use the inherited one. // start/stop methods are needed, I am skipping them for brevity }
There are different ways how to implement caching of per-tenant datasources. Ideally, I would like to reuse Context for that. It turns out this is pretty simple!
We want each tenant datasource to have its own datasource name and binding key. To allow repositories to obtain the datasource via
@inject
, we can implement a "proxy" datasource provider that will be resolved using one of the name datasources.export class TenantDataSourceProvider implements Provider<TenantDataSource> { private dataSourceName: string; private bindingKey: string; constructor( @inject('datasources.config.tenant', {optional: true}) private dsConfig: object = config, @inject(SecurityBindings.USER) private currentUser: UserProfile, @inject.context() private currentContext: Context, @inject(CoreBindings.APPLICATION_INSTANCE) private app: Application, ) { this.dataSourceName = `tenant-${this.currentUser.name}`; this.bindingKey = `datasources.${this.dataSourceName}`; } value() { if (!this.currentContext.isBound(this.bindingKey)) { this.setupDataSource(); } return this.currentContext.get<juggler.DataSource>(this.bindingKey); } private setupDataSource() { const resolvedConfig = { ...this.dsConfig, // apply tenant-specific settings schema: this.currentUser.name, }; const ds = new TenantDataSource(resolvedConfig); // Important! We need to bind the datasource to the root (application-level) // context to reuse the same datasource instance for all requests. this.app.bind(this.bindingKey).to(ds).tag({ name: this.dataSourceName, type: 'datasource', namespace: 'datasources', }); } } export class TenantDataSource extends juggler.DataSource { // no static members like `dataSourceName` // constructor is not needed, we can use the inherited one. // start/stop methods are needed, I am skipping them for brevity }
The code example above creates per-tenant datasource automatically when the first request is made by each tenant. This should provide faster app startup and possibly less pressure on the database in the situation when most tenants connect use the app only infrequently. On the other hand, any problems with a tenant-specific database connection will be discovered only after the first request was made, which may be too late. If you prefer to establish (and check) all tenant database connection right at startup, you can move the code from
setupDataSource
to a boot script and invoke it for each known tenant.Open questions:
- Where to put this new provider file and how to register the provider at boot time? We don't want
@loopback/boot
to treat it as a regular datasource file, because we want to bind the provider, not the datasource class.- How to enable
lb4 repository
(and other) commands to recognize our special datasource and include it in the list of known datasources?A possible solution is to enhance
@loopback/boot
to support datasource providers (e.g. using the conventiondatasources/{name}.datasource.provider.ts
) and also enhance@loopback/cli
to recognize these providers too.
@bajtos This is great. The Datasource-based tenant isolation is exactly what I was looking for. However, I want to understand why you're checking for the cached datasource in the Context
instance injected via @inject.context()
when you can directly check the Application
instance injected via @inject(CoreBindings.APPLICATION_INSTANCE)
?
I just hope I'm not missing something related to the binding scope since all we're intending for the cached datasources is for them to be SINGLETON
+1 for providing this kind of functionality in Loopback in a defined way to guide the user's implementation.
Could eventually support different implementations such as
An example of logical isolation could be considered the $owner role in LB3. However I consider it non complete since it only applies to instance methods through the usage of modelId. Extra work needed for isolation of generic CRUD queries (find, update, create) etc. using a common API that will automatic filter responses and interactions according to logged in user's token.
This could be really innovating and up to today's standards solution
I have an error when I try to use multi-tenancy: any solution ? Thanks
As it complains, the base class only accepts two args. If you meant to configure the dataSource with schema
, you need to change the dataSource
in the constructor before calling super
.
@raymondfeng I can call dataSource before super: Any example please?
You can define a function such as:
function updateDataSource(dataSource: juggler.DataSource) {}
Then in the constructor:
super(entityClass, updateDataSource(dataSource))
@raymondfeng I tried to run your example, but it didn't store tenantId
, although I passed it correctly!
{
"ids": {
"User": 4
},
"models": {
"User": {
"1": "{\"tenantId\":\"\",\"name\":\"Tom\",\"id\":\"1\"}",
"2": "{\"tenantId\":\"\",\"name\":\"Red\",\"id\":\"2\"}",
"3": "{\"tenantId\":\"\",\"name\":\"Roy\",\"id\":\"3\"}"
}
}
}
Did you take the final decision for enabling multi-tenancy on LB? I'll start to implement it depending on @raymondfeng example (but using JWT & PostgreSQL) so I expect I need some help in the next days. Do you prefer to ask my questions here or in StackOverFlow (under loopbackjs tag)?
@bajtos Thank you for the above inputs. But how to use same Datasource to connect to multiple schema (schema selection is done at runtime), so that same connection pool is used and the number of connections to Database is limited. We have 500+ schema to connect to.
Multi-tenancy and dynamic datasource can be handled by datasource based tenant isolation. Check the loopback4-multi-tenancy package. It may help.
Recently, several people asked about implementing dynamic schema selection to enable schema-base multi-tenancy, where tenant isolation is achieved via DDL Schemas. (If you are not familiar with DDL schemas then you can learn the basics e.g. in PostgreSQL docs).
I am opening this Epic to do discuss possible solutions, implement necessary improvements and document how to implement multi-tenancy, possibly including an example app.
Related discussions:
Aspects to consider:
lb4 datasource
@loopback/boot
(if needed) or move the registration to a different place (e.g. from a datasource file to a boot script)