Do we want a new Juggler?

I've been thinking about the persistence story in LoopBack over the last few months and have been wondering if the juggler is the right approach for the framework.

Warning: Opinions Ahead -- Please don your 3D glasses now.

Background

In the current iteration of loopback and loopback-datasource-juggler, we define relations between Models to provide much of the interoperability that users have come to rely on for making their REST APIs. These relations are supported by a Domain-Specific Language (DSL) that leverages that metadata to generate queries against datasources.

The Everything ORM

In many cases, our mapping of the DSL has required us to make an ORM-like set of facilities that generate query strings (in the case of SQL) or translate our DSL into one understood by the target database/database drivers (MongoDB, as an example). Support for this idea falls short of what many in the development community would expect for each of the individual use cases; SQL queries are often limited to primitive, inefficient operations and some NoSQL query objects do not accurately represent their original intention when translated accordingly. In many cases, users must fall back on basic methods that do not take advantage of the metadata our relations and DSL represent.

In each of these cases, whether or not it was our intent, we have implied to the community that we would shoulder the burden of providing a reasonably-complete and efficient set of query-generation tools to work hand-in-hand with our DSL to provide fine-grained control to various datasources.

In my opinion, the very idea of having our own DSL gives this impression; what purpose does it serve otherwise?

The Extensibility Problem (Routing around the Damage)

In the case of loopback-next, our extreme extensibility is a massive advantage, but it comes with the a significant "downside"; members of the community will simply avoid the Juggler if it doesn't meet their needs, and those that find it sufficient will keep using it.

"Great! Why's that a problem?"

Our Choice Informs Our Design

The decision of which approach we consider to be the best practice will have a direct effect on the design of the other core components of the framework. As an example, one of the other things that loopback 3.x gives you with its relations is automatic generation of REST APIs based on your models. "Foo hasMany Bar"? Great! We'll generate all of the /foos/{id}/bars routes for you!

If we decide that there will be no Juggler for 4.x, then that probably means no relations, too. That doesn't mean we won't have a story for generating those API definitions in some other way, but it does give you two very different approaches.

"We can keep relations without using the DSL or the juggler for persistence!", you might say. But would you want to create all of the relationships between your Models in a declarative format just to generate the shape of your REST APIs? Typing /foos/{id}/bars into a Swagger editor isn't any more difficult than making those relations. The combined value of getting that for free as well as being able to leverage that metadata to generate queries against datasources is what makes the sales pitch for the idea of having those relations to begin with.

DSL? I Prefer Cable, Thanks

Convincing community developers to invest great effort into creating these middleware layers that translate our DSL into commands for various database drivers is a tough sell when ready-made ORMs can be liberally sprinkled into your code in loopback-next.

Many of them come with their own Model and relationship engines (sequelize being the example everyone is tired of me bringing up whenever we discuss this), and since they tend to specialize in their chosen domain, they're often very efficient, well-written ORMs that do one (or a handful) of things extremely well.

Competing with this would require us to once again take on the burden of building all of these connectors ourselves, only this time, we'd have to make damn sure that we're making solid-quality SQL statements, and properly translating our MongoDB queries because developers will simply swap to using the underlying drivers if they're not already in too deep. Combine this with the fact that we just don't have the resources to build all of this in a timely fashion and the value-for-money of this proposition suddenly seems thin at best.

What Other Circus Acts Are We Good At?

The juggler is definitely something that distinguishes us from other frameworks that are value-adds on Express.js, Koa.js and so on. I find myself wondering what other killer features we could provide that existing frameworks don't. It might be that what we need to differentiate ourselves in a meaningful way doesn't even have to be a particular feature; if our framework and tooling do a better, faster job of getting people from idea to API, and gives you easy flexibility to add all sorts of awesome components, wouldn't that be our killer feature?

Trade Chainsaws for Bowling Pins

It might be that someone who's much more creative/intelligent than me knows of a way we can have our cake and eat it too; if a design for the new juggler provides a low-effort API to implement that bridges the gap between driver and DSL, then I'm definitely on board with the idea.

You may now remove your 3D glasses.

I definitely want to hear everyone's thoughts on this, both the validity/invalidity of my concerns, as well as approaches we could take to solve this problem. Thanks for reading!

The current version of loopback-datasource-juggler is overloaded with many responsibilities, such as:

A type system for model definitions
A JSON-based language for typical query and mutation
Validation
Relation (may go beyond models backed by the same database)
Navigation and inclusion of related data following relations
Bind models to datasources to mix in CRUD
Map CRUD operations to connectors (which in turn implements the translation from our JSON based DSL to SQL or NoSQL commands
Proxy for other forms of APIs, such as REST/Swagger, SOAP WS, or gRPC
Intent-based operation hooks

When we discuss what should be or should not be supported out of box in LB next, these different perspectives should be separated to avoid confusions.

We have planned for refactoring the juggler into separate modules. The @loopback/repository package for loopback-next is also a starting point for the effort.

Let me start by admitting my mixed love/hate relationship with juggler.

On the brighter side

Having the standard ORM in LoopBack allowed us to implement great tools simplifying building of LoopBack apps:

Conventions and a project layout that's familiar to all LoopBack developers. You can take any LoopBack developer, show them a project scaffolded by lb app, and they will immediately know where to look for model definitions and how to read them.
Because LoopBack understand juggler models, we can use model metadata to describe the format of API requests and responses. No need to maintain the same list of model properties in multiple places (Swagger documents, Model configurations).
Our tooling like lb model can stay relatively simple, because there is only one ORM to support.
By having a rich CRUD API (including relations) exported via REST, it's super easy to quickly built a prototype backend allowing developers to focus on figuring out the front-end UX, which is much more important in new projects.
Code generators like loopback-sdk-angular, loopback-sdk-xamarin, TypeScript/Angular2 SDK have a lot of insight into application REST API and data models, which allows them to provide richer clients.
Change replication and offline sync would not be possible if we didn't have a single API for all databases, including local storage in the browser.

On the darker side:

There is a reason why the SQL language cannot be used for NoSQL databases: each NoSQL database has a different approach to address the CAP theorem and therefore requires a different programming model and mindset. For example, MongoDB prefers partial updates using operators like $inc in order to achieve data consistency. OTOH, CouchDB/Cloudant does not support partial updates OOTB and maintains a revision (SHA hash) of every document to guarantee consistency.

Juggler is trying to support all these different programming models, and as a result, it has to pick only a small subset of features - those that are supported by all backends. I am afraid this subset is so small, that it becomes an obstacle very soon in the project/application development cycle.
By abstracting away important database-specific constraints, we make a false impression that writing a robust backend is easy and does not require advanced database-specific knowledge. As a result, users are writing naive code that's prone to race conditions and may lead to data loss when the server gets under heavier load.
We make it difficult to use databases the way how they are designed. For example, we still don't have any well-documented easy-to-use way for SQL transactions. MongoDB users should be using update operators, but that requires extra configuration and breaks LoopBack's model validation. Applications storing data in NoSQL/document databases should have denormalized schema relying on embedded models in order to fully leverage the befits of NoSQL, but our support for embedded models is poor and broken.
Relations without SQL JOIN support - don't get me even started on that! Not only the performance is suboptimal, but since we are not running in transactions by default, each of the multiple queries executed by relation methods can work on a different version of the data.
Inconsistent behavior of the built-in CRUD API. Until recently, we had operations that behaved differently depending on the connector/database. Some connectors performed a full replace of all model properties (deleting those not including in the payload), other performed a partial update (preserving values of the properties not included in the request payload). I think it's likely there are more edge cases like this buried in our codebase.
CRUD API that's too difficult to reliably implement in all databases and at the same time not useful enough. For example, our current PATCH /api/{model}/:id function is prone to race conditions: it's implemented as two database queries (findById + updateAttributes) and it's returning the data from the first findById with the changes applied - this data may be different from the actual data held by the database.
Many parts of juggler are not production-grade. For example, autoupdate/automigrate operate on live data and don't provide any preview of the changes to be made. No sane person would run that in production! Not to mention that certain changes (e.g. column renames) cannot be handled by an automated tool and always require a bit of manual code added by the developer. A more common approach is to (pre)generate migration scripts that are reviewed and edited by developers, then tested in all environments before they are finally executed on production. It should be reasonably easy to autogenerate such scripts using the functionality powering autoupdate, but we never made it a priority.
Implementing a good database connector requires advanced knowledge of that particular database. We don't have experts for all databases we support, therefore our connectors are often mapping juggler API/DSL into database-specific queries naively, sometimes missing advanced database-specific features (like PostgreSQL support for JSON data), sometimes introducing subtle bugs (like the initial version of our Cloudant connector that was not passing _rev to clients IIRC).

My takeway

In LB Next/4.0, we are decoupling REST API from the ORM API. This will take away benefits like single definition for both REST API and ORM, while bringing other advantages like giving developers tighter control of there public REST API.

In that light, I think we are pretty much ready to abandon juggler, if we can find a way how to preserve the following features:

Tooling for creating/editing project artifacts (models, database connection configurations, etc.) that supports different ORM/database drivers.
Single project layout convention mostly independent of the ORM/database driver used.
Support for IBM databases (db2, cloudant, etc.)

While we are discussing alternative ORMs for SQL backends, I'd like to bring the following projects into attention:

http://knexjs.org/ - SQL query builder, a lower-level building block
https://www.npmjs.com/package/bookshelf - based on Knex
http://vincit.github.io/objection.js/ - another ORM based on Knex

ORM

IMO the biggest challenge to build an ORM would be we have to either build a "perfect" one or don't do it.

By "perfect" I mean:

We leverage database specific features to optimize performance, according to community's feedback the most known example of this is building SQL JOIN query for relations.
An entire relation system I could imagine will support 1:1, 1:n, n:m mapping for "has", "embeds", "reference", "hasThrough", "referenceThrough". For 1:1 and 1:n mapping we will implement the corresponding validation rules, which would be a burden to play with hooks, and results in the same concern @bajtos raises about the transaction.
We need to well document and make it clear what are supported and guaranteed by our ORM and what are users' responsibilities. We can elaborate it in a polite way to less disappoint them, but this could prevent their design from going further with a wrong direction then coming back to us to patch the connector eventually like a wrapper of driver.
Both the standards across all database in one ORM and the criticism against any piece of it could be relatively subjective, we may want statistic data to help on decision.

Compare with other existing ORMs from community, it's not hard for us to come up with a better design&implementation in a specific area, but given the resources we have, my concern is how much time do we need to build an OVERALL better ORM... And if we turn to be more determined on closing features that are not reasonable for us to support, would that benefit users more than telling them from the beginning to spend some time on investigating the most appropriate libs/modules they need in product? And a bottleneck of developing with the current juggler is: some standard are too strict(e.g. ad-hoc sort) across 10+ connectors, if we still expect to have unified behaviours, I would suggest to only officially maintain connectors for ibm databases and the most popular ones: db2, cloudant, mysql, mongodb. Actually considering the incoming request from paying customers, this is still an increasing list :(

Sugar functions

IIRC we have a story discussing simplifying functions provided in dao.js, I understand that sugar functions to some extend saves people time, while again...thinking of the effort to maintain them and some similar functions make people confused what is their difference, then it becomes another overhead of documentation and a compatibility debate * N(the connectors we support)

Remote method hook

Actually it's now implemented in loopback core, I love the hook system and I assume loopback-next already implements it.

Scope

People may still want to have a set of apis organized under a certain name or say tag, and also easy to reuse when extending model.

Inclusion, Getter and Setter, 2nd level Cache

Inspired by this article and the "updateOnly" PR recently merged into juggler, I think what limited by our current resources are those things lead us to build SQL/NoSQL queries, but we still need a module serve as a middleware between the modelDef and a db's driver functions.

E.g. In a model attached to couchdb/cloudant datasource, _rev property only shows up in update/replace/delete methods but not create. Which means given a set of properties defined in an entity, user may apply different rules bases on api type.
E.g. SQL UPDATE query returns the affected rows instead of updated data, but user still needs a model instance returned.
The current juggler stores cached related items and only refresh when user wants to.
Similar to cached relations, the module can provide an inclusion system that integrate a model instance and its related items. User write the code of related data retrieving, the module handles the cache and refresh, and structure the related object.

To echo @jannyHou's comments, I propose that we first build a list of features/responsibilities for the current loopback-datasource-juggler to better understand what it does today so that we can better decide what it should do/should not do for LoopBack next. We need to keep/improve the good parts and remove/fix the bad parts.

Having a big-bang/wholesale yes/no debate is NOT going to be very productive, IMO.

I don't think it was ever about all of the juggler; many of the parts of juggler v3 have already been spoken for as separate modules within loopback-next, like authentication. I'm mostly using the term juggler for the persistence and relations since they don't have their own names.

I do agree that we should make that list anyway, though figuring out exactly what is affected by the greater whole is difficult to talk about, and easier to demonstrate, which is why we're working on a "real" app to start testing out these use cases: https://github.com/strongloop/chit-chat

Sorry if this out of scope for this issue but I just want to ask whether auto-discovery of models and relations is still part of the planned feature set of the new Juggler (which it seems to be heading towards)?

Discovering models and relations based on existing tables is a major part of our current workflow with Loopback as we have hundreds of "old" tables that need CRUD APIs. Hand writing each model and method would make this framework almost unusable here. I saw someone else asking this in a referenced issue but he didn't get an answer (https://github.com/strongloop/loopback-next/issues/419#issuecomment-314490175)

@ExTheSea This is one of those design decisions that would be influenced by the way we choose to implement and support the persistence layer.

If we decide to continue providing our own ORM, it would mean that we would also be responsible for the discovery and migration stories that are a part of loopback@3.

My current proposal is to use mixins for popular ORMs, as well as templates to help auto-generate code for users based on their chosen protocol (REST, gRPC, MQTT, etc) and chosen mixin. We're currently hashing that out as a team and any feedback for either approach would be welcome.

If you have any questions about what my proposal would entail, just ask. :)

So, as a team, we came to a decision yesterday regarding our approach here and this is what we've come up with

Roadmap for Juggler

We will be keeping the Juggler as a part of LoopBack, but we will be constraining its scope for the next major release.

Planned Changes

Convert juggler to TypeScript
Remove some of the more exotic functions like replaceOrCreate and findOrCreate to simplify the API and to remove some of the potential for race conditions created by these non-atomic operations
Provide a common model schema as we currently do with LoopBack 3
Continue to provide basic querying support (filtering)
Add support for simple JOINs in SQL. This will also include an explanation of the limitations of the DSL, and that we will not support a SQL builder that can perform any and all arbitrary queries.
Continue to support the .query and .execute functions for using your own (SQL/MongoDB/...) commands

Other ORMs

We will provide some tutorial materials on how to create your own mixins to make use of your own ORMs, though we will not provide templating support or other materials to ease in the use of those ORMs.

Additional Questions

cc @strongloop/loopback-devs

Will the new Juggler live in the monorepo?
Given that we are centralizing on the Juggler model concept, are we abandoning the idea of allowing users to "plug in" their own templating for the CLI, limiting that to controllers, or something else?

Another question: Will we constrain the number of relationships to something simpler than before?

+1 for having the new juggler live in the monorepo.
Extensibility for CLI is a nice to have (different kinds of apps / protocols / extension templates would benefit from this extensibility) but I don't think it should be a priority right now.
I would like to see the number of relationships simplified to start with and more can be added depending on use cases and needs. This should help achieve consistency, simplicity and maintainability.

+1 for having the new juggler live in the monorepo.
For relations, one way to simplify it is probably separating the constraint apart from relation, like we only have 1:m relations (hasMany embedsMany referenceMany) but apply another constraint layer to realize 1:1. Just a thought, need more time to think of it.

Thank you @kjdelisle for writing down the proposal, and @virkt25 and @jannyHou for your comments. I'd like to add few more thoughts to consider.

First of all, I think we should make juggler a first-class package that can be used outside of LoopBack too. We have interesting features that are not available in other ORMs - see e.g. https://github.com/strongloop/loopback-next/issues/776#issuecomment-349976735 and the feature comparison between TypeORM and Juggler that @raymondfeng wrote but which I am not able to find now :( (@raymondfeng - could you please post link to your table here?) We should be promoting our ORM more too, so that when people learn that LoopBack uses Juggler as the default ORM, they won't think "why are they using this ORM I never heard of instead of ", but instead they will understand Juggler is a well-known fully-featured ORM and we have to pick one anyways.

Convert juggler to TypeScript

I want us to work on the "new" juggler incrementally. I really want to avoid the situation we have here in loopback-next, where we spent 12 months building a new version from scratch and there is still nothing that our users could use in production.

Instead, I am proposing the following approach:

Pick few people that will become the new owners of juggler and will spend most of they work time on juggler. One person may be good enough for start, but we need at least one other person that would be able to provide meaningful reviews for pull requests.
Let the new team to go trough the backlog of know issues and spend the first ~4 sprints (about two months) on fixing those bugs in the current code base. This will give them better understanding of where we are right now and what are the biggest pain-points of the current code and design. Because they are working in the current code base, the bug fixes will be released in 3.x versions and made immediately available to all LoopBack 3.x users. Yay!
Once the team is familiar with the current codebase, they can start working on a new major version, perhaps move the code to a monorepo and convert it to TypeScript. Even then we should not be rewriting any code from scratch. Instead, we should start with the current code base and perform refactorings to improve the code and the design, remove features that are too difficult to use and/or to maintain, etc. Any breaking change should be properly documented, so that when the time comes to publish a new release (ideally in less than 6 months after the team was formed), we can easily compile useful release notes and a migration guide.
We can iterate this approach and release multiple semver-major versions before we consider the most of backwards-incompatible work being done. The goal is to publish our changes as soon as is reasonable, so that people can start using our new code and we can get feedback as early as possible.

Will the new Juggler live in the monorepo? +1 for having the new juggler live in the monorepo.

I personally see a lot of value in having a monorepo that contains Juggler and all connectors we are maintaining. In my past experience, it was cumbersome to add new features to Juggler, because a PR to juggler would have to be accompanied by 10+ pull requests to our connectors to implement support for that new feature. Sharing the test suite between juggler and the connectors had it problems too, how often we could not land a pull request in one repository because the tests were failing until another pull request was landed somewhere else?

Having the ability to test all connectors together with any change made in juggler will simplify our life too, as we won't have to rely on cis-jenkins dependency-based-triggers anymore. (cis-jenkins has two issues: a) it can be slow to start downstream jobs b) test results are not visible to community (non-IBM) contributors ).

The downside is that running all connector tests will add significant time overhead to npm test and CI runs. However, I think this problem is solvable by CI tooling. For example, we could write a tool that will check git patch of the changes we are testing, decide which packages are affected (either directly or by changes in their dependencies) and then run the tests only for those affected packages.

What I think is a more important question is whether Juggler and connectors should live in loopback's main monorepo, or whether they should have their own monorepo? If we want to promote Juggler as standalone ORM, then it may make more sense to let it have its own monorepo, own issue tracker, etc. (Another benefit of a different monorepo is that we can defer implementation of the CI tooling I mentioned above for a while, because npm test in loopback4 monorepo will stay fast).

Last but not least, I think we should find a new name for our ORM, perhaps one that's not so coupled with LoopBack. How about "Juggler ORM"? (I am already imagining a cheerful logo of a circus artist juggling with balls 🤹‍♀️🤹‍♂️, where each ball can be a logo of a different SQL/NoSQL database.) Few more alternatives that come to my mind: "Strong ORM" to keep StrongLoop's theme of prefixing modules with "Strong", "LoopBack ORM" to keep the association with LoopBack, or perhaps @loopback/juggler.

I would like to see the number of relationships simplified to start with and more can be added depending on use cases and needs. This should help achieve consistency, simplicity and maintainability. For relations, one way to simplify it is probably separating the constraint apart from relation, like we only have 1:m relations (hasMany embedsMany referenceMany) but apply another constraint layer to realize 1:1. Just a thought, need more time to think of it.

+1 for simplifying things. I think there will be many more opportunities to simplify things. For example, embedded relations have always had a lot of shortcomings, they may be a good candidate for removal too.

We discussed the next steps for this issue with @kjdelisle and come up with the following plan:

Monorepo: https://github.com/strongloop/loopback-next/issues/890
- connectors + juggler + dependencies like loopback-filters
- a different repo than loopback-next to keep lerna bootstrap fast enough
- preserve original repos - we will keep LB-3.x codebase there
Migrate juggler to typescript: https://github.com/strongloop/loopback-next/issues/891
Migrate individual connectors to typescript too: https://github.com/strongloop/loopback-next/issues/892
Drop callback APIs, use Promises only: https://github.com/strongloop/loopback-next/issues/896
Spike: Remove data-access APIs we don't want to support anymore, both from juggler and connectors, e.g. updateOrCreate, findOrCreate, etc. https://github.com/strongloop/loopback-next/issues/897
Spike: what to do with EventEmitters (Observables?) https://github.com/strongloop/loopback-next/issues/898
Semver-major release of everything (alpha pre-release or preferably a 0.1.0 release if we change names from loopback-* to @loopback/*)

We will create follow-up issues later. There should be a special Epic to group these issues together.

We (@kjdelisle and me) have created follow-up issues with the exception of the step 7, we will handle publishing as part of our regular work.

loopbackio / loopback-next