Closed bajtos closed 5 years ago
@bajtos thank you for starting the discussion around NoSQL backend support for relations; I get the gist of your proposal and since we're not using the LB3 relation engine, it's logical to introduce different flavours of relations for NoSQL DBs.
I do like to add though that the bulk of the issues that arose with MongoDB and relations in LB4 have to do with the strictObjectIdCoercion
flag and how the connector treats ID values. Maybe it's worth it to test it out with cloudant and see the behaviour there as well.
For example, instead of storing a foreign key in the target model, we can store id of related model(s) in the source model and use optimistic locking scheme to enforce the constraints
The referencesMany
and referencesOne
relations store FKs with the source model. For example:
Customer
- publicProfileId ( a customer has a public profile)
Customer
- emailIds ( a customer has multiple emails)
See my https://github.com/strongloop/loopback-next/issues/1718#issuecomment-446906249. I think we need to look at the bigger picture first.
Based on @raymondfeng's comment above and the discussion we had elsewhere, I am proposing the following:
Treat 1-N relation as a concept that requires different implementation (schema design) depending on the database used. Explain this in our documentation for relations.
Make it very clear in our documentation that hasMany and belongsTo works best with SQL databases that are able to enforce referential integrity. When these relations are used with NoSQL, referential integrity is not enforced and we end up with a "weak relation"
In the docs for hasMany and belongsTo, explain NoSQL users that hasMany/belongsTo relations are not suited for NoSQL databases and that we are working on a better solution.
Create a new section (or even a new page) in our docs for relations and explain users what relation type to choose depending on which database they use.
For example: if you are building a 1-N relation, if you have SQL - use HasMany, if you have Cloudant - use ReferencesMany, if you have MongoDB - use embedsMany, etc.
We should verify that our recommended solution is actually a good one that works well for the target database. This will most likely require research on our side, as we don't have deep knowledge of different NoSQL databases.
Create a new spike user story for each NoSQL database we have an official connector for. In this spike, we will research what tools are offered for ensuring referential integrity, come up with a proposal on how LoopBack applications should implement 1-N (hasMany), N-1 (belongsTo) and 1-1 (hasOne/belongsTo) relations. For example, we can recommend referencesMany or embedsMany, but also find out that a completely new relation type is needed.
Based on the outcome of these spikes, create user stories to implement the missing relation types identified as needed for the databases we support. These stories should cover both the implementation and documentation updates. For many (if not all) relation types, a spike story to figure out implementation details may be needed first.
I am going to convert this GH issue into an Epic.
Converting back into issue in favor of a newly created epic #2331
Based on @raymondfeng's comment above and the discussion we had elsewhere, I am proposing the following:
Great summary. +1.
- Treat 1-N relation as a concept that requires different implementation (schema design) depending on the database used. Explain this in our documentation for relations.
- Make it very clear in our documentation that hasMany and belongsTo works best with SQL databases that are able to enforce referential integrity. When these relations are used with NoSQL, referential integrity is not enforced and we end up with a "weak relation"
- In the docs for hasMany and belongsTo, explain NoSQL users that hasMany/belongsTo relations are not suited for NoSQL databases and that we are working on a better solution.
Moved these steps into a new issue: Document hasMany/belongsTo/hasOne limitations for NoSQL databases #2340
Create a new section (or even a new page) in our docs for relations and explain users what relation type to choose depending on which database they use.
See #2341
Create a new spike user story for each NoSQL database we have an official connector for. In this spike, we will research what tools are offered for ensuring referential integrity, come up with a proposal on how LoopBack applications should implement 1-N (hasMany), N-1 (belongsTo) and 1-1 (hasOne/belongsTo) relations. For example, we can recommend referencesMany or embedsMany, but also find out that a completely new relation type is needed.
Created the following spike stories:
I feel the discussion is over and since we have a list of small actionable tasks to follow up, I am closing this GH issue as done.
Our current implementation of model relations (has-many, has-one, belongs-to) is deeply rooted in SQL and based on the assumption that the database take care of referential integrity for us.
Example 1: "Customer has many Order instances" and "Order belongs to Customer". When creating a new Order instance, we expect the database to verify that
Order.customerId
is matching the id value of an existing Customer record. We don't have any reliable & atomic way to do this check at LoopBack side.Example 2: "Customer has one Credentials instance". When creating a new Credentials instance, we expect the database to verify that there are no other Credentials instances already created for the user. We don't have any reliable & atomic way to do this check at LoopBack side.
SQL databases provide FOREIGN KEY and UNIQUE constraints that work great for this flavor of relations.
The situation becomes more tricky when we try to map this approach to NoSQL databases. Many NoSQL databases do not provide FOREIGN KEY and UNIQUE constraints, this is often a constraint caused by CAP theorem.
For example, it's not possible to enforce UNIQUE constraint when the model data is stored in multiple physical machines and a network partition occurs (a node performing a write operation is not able to reach other nodes because of networking problems, and thus is cannot verify that the new value is not violating uniqueness constraint for records stored on those nodes).
I think we should rethink the way how we are modelling relations and offer different flavors optimized for different backends.
For example, instead of storing a foreign key in the target model, we can store id of related model(s) in the source model and use optimistic locking scheme to enforce the constraints
We can even store the related models as embedded documents, this should work great for Document databases.
Related issues & discussions:
@strongloop/loopback-next @strongloop/loopback-maintainers thoughts?