reactioncommerce / reaction

Mailchimp Open Commerce is an API-first, headless commerce platform built using Node.js, React, GraphQL. Deployed via Docker and Kubernetes.
https://mailchimp.com/developer/open-commerce/
GNU General Public License v3.0
12.34k stars 2.17k forks source link

Spike: Investigate transitioning from Simple-Schema to Mongo Schemas #5481

Closed brent-hoover closed 2 years ago

brent-hoover commented 5 years ago

Now that Mongo schemas have been out for a while, we should look into them and see how they could be leveraged to add a better level of data integrity/performance to using Simple-Schema.

janus-reith commented 5 years ago

I wondered the same thing when I found out that attachSchema is something Meteor-based that will become obsolete.

While the whole stack is tied tightly to mongoDB and there don't seem to be alternatives yet, I still wonder if it would be possible to keep the Schemas database-agnostic and simply enhance Simple-Schema to make use of Mongo Schemas where available.

Maybe Simple-Schema could somehow pass through data were applicable and thereby still give performance benefits.

And while we are at it: Moving a reference of the schema definitions to each registry.js would really help keep things more structured, as collections and their indices are defined there aswell. Currently all of them still are in /imports/collections/schemas, while attachSchema happens in /lib/collections and for some collections in the /lib or /client folder of plugins.

aldeed commented 5 years ago

Moving SimpleSchemas into their corresponding plugins is part of https://github.com/reactioncommerce/reaction/issues/4996, in progress and happening soon.

Pretty much none of the API is relying on attachSchema anymore. Most places should be doing explicit schema checks just before insert/update, which is all attachSchema does anyway.

MongoDB schema validation is done with JSONSchema. There are a few OSS packages in the area of SimpleSchema <-> JSONSchema:

Neither are quite able to convert from SimpleSchema to JSONSchema, but they can be used as reference and a converter is likely not too hard to write.

So then there are essentially two options:

This can be done as an experiment in a single plugin. We are no longer dictating how and where plugins store data so this is really a decision each plugin can make and implement. (In theory. There are still some places where one plugin directly queries collections owned by another plugin, but those could be cleaned up and abstracted as part of implementing this.)

focusaurus commented 5 years ago

Getting mongodb schemas going and enforced in the db: :+1:

JSONSchema seems like the most pragmatic approach. Hopefully we can find a good implementation of validation on npm. I've used a few in the past and the quality was so-so.

One stack to consider would be migrating our SimpleSchema code to joi and finding an npm package that could convert/downsample that to JSONSchema for mongo.

aldeed commented 5 years ago

I've used joi a bit and it's nice for what it is, but may not be best for the needs here. I recently tried superstruct and there are a few things I didn't like but overall it probably does not get much simpler, with the added benefit of being fast and everything being strings and scalars, which means it might be the easiest library to write a JSONSchema transform for.

(The main thing I remember not liking is that it does not have built-in concepts of things like min/max, etc. that would need more info to work. So you'd have to build a "min0" validator, then a "max100" validator, and so on. These custom validators would also be tricky to convert to JSONSchema.)

As much as I'd like to eventually not have to support SimpleSchema, given that it's already in use everywhere and familiar to Reaction devs, the quickest path forward on Mongo schemas is probably to write a JSONSchema transformer for SimpleSchemas.

janus-reith commented 5 years ago

I think it could be great to find a solution where we could also derive GraphQL types from, so Schemas don't need to be written twice.

janus-reith commented 4 years ago

Hey, has there been any further discussion on this and did someone maybe come up with a new idea?

aldeed commented 4 years ago

I'm not aware of any further discussion. The last I heard the thought was that a single plugin could try implementing mongo schemas for the collections it owns as an example and a trial, and then evaluate based on that. If someone wants to do a PR for that at some point, that could get things moving.

janus-reith commented 4 years ago

Hm, I would only want to put effort into a solution that reduces verbosity, by defining e,g, Product at a top level, then deriving both Apollo and DB Schemas from it, and whatever validation layer might be called on api calls then.

@aldeed do you know any solution that compiles down to both GraphQL Schemas and jsonschema? I did a quick search but found nothing.

Another thing I had in mind, for typical calls we would then have three steps: Apollo Schemas -> Validation in Function -> MoongoDB JSONSchema. We can't really skip any of them completely, and while Apollo Schemas and MongoDB Schema probably have a small footprint on performance, the Validation inside a function might have one but also offers the most specific possibilities of validation and also might be necessary in cases where it prevents sideffects inside some function, as we don't necessarily have a straigt path where data goes from where. (Hope it is clear what I mean?) So maybe it might be possible without added complexity inside some called function, to see if that has passed graphql validation, and therefore inside that next validation step, skip all field validations that are not more specific than thei already checked Apollo pedant - Just an idea. All fuctions, e.g. queries on context should still work outside from being directly called from a graphql call, therefore this should stay optional. If only function parameters would work as sure indication that something passed graphql validation, those could be set on each of the resolver when they call queries on context as someoptiona parameter. But maybe this wouldn't even be necessary if we could gain insights on the chain from context itself maybe.

Would love to hear your ideas and opinions on this.

aldeed commented 4 years ago
set321go commented 4 years ago

I did a minimalist spike today just looking at adding a schema to a collection and getting it into mongo. https://github.com/reactioncommerce/reaction/compare/trunk...set321go:SPIKE_5481

Its not a true conversion of the Shop schema

  1. I omitted anything that was marked as deprecated
  2. I removed blackbox objects and just made them strings
  3. I changed dates into strings because there is no native date type

I realized while i was doing this that the mongo implementation of jsonSchema has quite a few limitations. Some of the big ones that i was planning on using were:

  1. definitions & $ref so you're unable to split up a large object and compose it like is already done in simpl-schema.
  2. default many places where a required value is specified it would have been handy to specify the default value

If i was to extend this and use it for validation in the app i would probably use the ajv package as it has clear instructions/support for using different versions of the spec.

EDIT: I should probably add that I am in favour of a solution that uses something that is not specific to mongo. json-schema is a good option as its wide support makes it portable and robust, I like Joi as an option but I think json-schema has more going for it in this situation.

aldeed commented 4 years ago

@set321go This is very helpful research, thanks! I did not realize $ref isn't supported. That is unfortunate. Default values aren't critical but would also be nice. I see a longstanding issue for that here: https://jira.mongodb.org/browse/SERVER-24430

Note that to be the same as SimpleSchema, every object will need additionalProperties: false in the JSONSchema because the default is true. For the blackbox: true objects, those can be objects with no defined properties and additionalProperties: true.

I like your implementation in ReactionAPI.js. If you want to submit a pull request with only the changes from that file, I think we could merge that pretty easily and it would allow other people to experiment more with this. When you get the shop schema in a place where it's identical, you could submit a pull request for that, too. I'm not certain whether we're going to go down this road with the built-in plugins, but a PR would be a good way to test and discuss the pros and cons.