mitodl / ocw-studio

Open Source Courseware authoring tool
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

[mini-rfc] YAML schema validation #973

Open alicewriteswrongs opened 2 years ago

alicewriteswrongs commented 2 years ago

I wanted to open a (not very urgent) conversation about our approach to validating site configs here in Studio. Right now we are using a library called Yamale to validate site configs against a schema. This works pretty well for ensuring that a WebsiteStarter can't be saved with an invalid configuration and whatnot, but I think there are a few limitations /drawbacks to this approach:

I'm wondering if we might want to explore two separate things, 1) converting to use a standardized data schema format like JSON Schema (or one specific to yaml if it exists) and 2) pulling the schema validation out into a library or something so that we could have github actions checking the validity of our site configs in ocw-hugo-projects (and possibly elsewhere in the future).

Anyhow, just something I was thinking about, interested to hear what you all think!

gumaerc commented 2 years ago

I think moving to use JSON would be a fine idea, especially if it gives us better and more portable validation like you describe. It's always bugged me a bit that we have starter configs checked into the ocw-studio repo, when really the source of truth is ocw-hugo-projects. Even the name of that repo was sort of chosen out of necessity, I think it would make more sense to call in ocw-hugo-config or something like that. Either way, when we run unit tests in studio that consume the starters, they should probably be using the release branch version of the starters in ocw-hugo-projects. Hugo doesn't consume the starter files, only ocw-studio does, so changing the format shouldn't have any consequences outside of how they are consumed in `ocw-studio.

ChristopherChudzicki commented 2 years ago

I've used JSON schema a bit in the past and was was very happy with it... Most of my experience is using it is through OpenAPI specs and with the js ajv.

Definitely appealing that it's not language-specific... Then there's more tooling around it, e.g., linters. And this always looked intriguing to me, though I've never used it: https://www.npmjs.com/package/json-schema-to-typescript

alicewriteswrongs commented 2 years ago

We don’t even need to necessarily use JSON as a storage format, we could continue to use Yaml for writing the configuration, and then convert the Yaml to JSON in order to validate it. I’m fine either way, I’m not attached to either one, but we do have some options.

json-schema-to-typescript sounds very interesting…I’ve used an OpenAPI to typescript library before which enabled automated backend-frontend type sharing, I think that could let us do something similar (right now our configuration types are basically declared in two places, in the Yamale thing and in typescript).

I also agree about the checked-in configs, it would be good to just automate using the configs from the repo and delete the checked-in ones I think.

ChristopherChudzicki commented 2 years ago

If we ever revisit this, I think we should consider JSON Type Definition (JTD), too. AJV has a nice comparison between JTD and JSON Schema.

JTD is newer and probably not as widely supported, which is definitely a con.

But JTD "describes the shape of the data" whereas Json Schema describes "a collection of constraints on the data". Consequently, JTD integrates more smoothly with Typescript (and probably other typed languages). As an example, I believe one consequence of this is