Identify a list of critical interfaces that implementations should have and empower implementers with better resources

Relequestual commented 1 year ago

The ecosystem has many implementations across many languages, but it's often a real challenge to advise people how to do some things the core team would consider fairly basic, for all implemenations.

The specification tells you what must be true about the JSON Schema, and it lays out requirements for what implementations must support, but it doesn't tell implementers how to support specific elements.

One basic example, being able to reference other schemas is something that's required. Some implementations provide an interface to add schemas ahead of time, while others provide a hook function to allow for resolving references at run time.

One way we can help fix this problem is to identify a list of critical interfaces that implementations should have, and empower implementers with better resources.

[x] Create an initial list based on functionality depended on by Bowtie
- Known initial examples (to be modified):
  - Draft selection
  - Automatic draft selection based on $schema
  - "Registering" of schemas for referenceability
  - Validation of schemas themselves beyond the metaschema
- Additional Items to Include (primarily optional specification behavior):
  - Enabling format support
  - Output format support
  - Annotation support
[x] Decide on serviceable terminology for referencing each piece of functionality
[x] json-schema-org/community#498
[x] Ensure implementers of new implementations have some way of recognizing they may want to update the interfaces page with information on their implementation when submitting it for inclusion on the implementations page (PR merged: https://github.com/json-schema-org/website/pull/173)

Assessed as high impact/low effort during our collaborators summit 2023.

Relequestual commented 1 year ago

While this wouldn't be a deliverable for this Issue, it would be awesome if we could encourage implementations to also create and encode their supported interfaces, and link to related docs for each interface. This would be dependent on other work not yet done, but recording it for future us.

gregsdennis commented 1 year ago

Just looking at the options my lib supports, these might make this list (in addition to some of the above):

Vocabulary registration (includes custom keyword definitions) - This seems fairly critical to supporting 2020-12.
Logging (e.g. validation walk-through) - My implementation can product logging output as it navigates the schema(s) and evaluates each keyword. Good for debugging. Not mentioned in the spec, though, so ??? (Edit: as of v5, my implementation no longer supports logging. New architecture doesn't allow it in the same way.)
Automatic resolution and fetching of references - Should be disabled by default, but mine (and my guess is a lot of others) can support this.
Specifying/Selecting output format

@jdesrosiers also does some pre-processing (compiling) in his lib. I wonder how many others do something like this. (Edit: as of v5, mine now does this.) (Probably worth a separate issue, but it might be nice to explore the analysis that's he's doing and document, if not standardize, some of it as a pattern to follow.)

Julian commented 1 year ago

(The list I put in the description was super barebones, I did less than 30 seconds of thinking -- with a task itself to go collect said interfaces -- all that being said, keep listing em! Definitely useful.)

gregsdennis commented 1 year ago

For this work, are we expecting to perform a survey of implementations to document (or link to docs) how they implement these interfaces?

Julian commented 1 year ago

I assume by "how they implement" you mean more "what function do I call" rather than "what is their internals" -- in which case yes I think so in the sense that we can then possibly generalize into "look for something like X" -- but I don't think we can/should necessarily list for every implementation what the precise API is, that's quite hard for the same reason as:

link to docs

(I don't think this will work well though personally -- interlinking in certain languages is easy, but definitely not all, let alone lots of languages all at once. I think actually linking to docs will be a recipe for broken links and/or lots of maintenance to keep the page working)

gregsdennis commented 1 year ago

I assume by "how they implement" you mean more "what function do I call" rather than "what is their internals"

Yes.

Relequestual commented 1 year ago

link to docs

(I don't think this will work well though personally -- interlinking in certain languages is easy, but definitely not all, let alone lots of languages all at once. I think actually linking to docs will be a recipe for broken links and/or lots of maintenance to keep the page working)

I think linking to docs of specific implementation features should be "self service" data driven from the source repo. For it to scale, we need a lot to be driven automatically, probably by Actions. Any such details should be practically no maintenance cost for us.

gregsdennis commented 1 year ago

Our implementations page is mostly self-service. I think the downside to that approach is that implementations aren't removed (or marked) when they are no longer maintained.

If I, a hypothetical random developer, stop working on my project because of time or whatever, I'm not going to think to go to the JSON Schema website and update it. I'm just going to stop working on the project. Most projects just go stale, even when GitHub provides the ability to archive it.

Relequestual commented 1 year ago

@gregsdennis sure, I totally agree with your concerns. I do think we can make some efforts to identify projects that have potentially gone stale. Although, I won't go into detail here as I think that's probably off topic for this issue. Do keep it in mind though! Totally right to raise the concern.

benjagm commented 1 year ago

As peer the last OCWM json-schema-org/community#429 : In the last OCWM we discussed options such as connect this with the self-reporting tasks of json-schema-org/community#412. Approaches like creating a Google form to gather information and store the info in a JSON file in the repository were commented. The team suggested to check the list of interfaces again a group well known implementations. @benjagm suggested being transparent and creating a working group to invite others to join their research efforts, with the hope of achieving more positive results.

gregsdennis commented 1 year ago

Julian commented 1 year ago

I'm starting on this document here.

You'll of course notice that's a page in Bowtie's documentation, which isn't to say that's where this will/should land, it's simply convenient to stick it somewhere for the minute, though indeed I likely will cop lots of this info for Bowtie's documentation anyhow after it's done, but I suspect folks will likely decide to put the page somewhere more like the issue description here, in the website documentation. Sharing immediately though in case anyone wants to throw in comments as I fill in the first version before a PR.

Relequestual commented 1 year ago

I've added the agenda label as I'd like to directly ask for feedback on the document @Julian has written so far.

Let's see if we can move this forward. @Julian Can you suggest a timeline for the next item in the list?

Decide on serviceable terminology for referencing each piece of functionality

An initial draft is fine, and we can review when we look to move/copy it into a JSON Schema org repo via PR.

Julian commented 1 year ago

The headings are the suggested terminology for the minute -- they cover all the ones mentioned in this ticket plus some extra ones -- though without descriptions it's hard to guess what's meant by each one -- I'm trying to get my keynote done last week and this one but next week I was hoping to at least fill out a first few there, so I think "end of next week" is a good time for at least reviewing the first few there.

gregsdennis commented 1 year ago

Henry called out that "@Julian has a really awesome exception interface for working with errors, including a programmatically queryable tree interface."

I think an error handling interface warrants inclusion. Others mention exceptions or other mechanisms to indicate that evalation halted for some reason.

Julian commented 1 year ago

I think an error handling interface warrants inclusion.

Whether you support using exceptions to indicate invalid instances is/will be this section but I'll add an additional one for exception-detail introspection.

I honestly don't know where to put ErrorTree categorically, I've never seen another implementation which has it, it sort of serves a similar purpose to output formats but it's obviously language-dependent, but it's a good call out.

gregsdennis commented 1 year ago

I'm not merely calling out what you have, but we're also looking at better defining ref resolution failures. In the associated issue (where Henry's comment came from), @jdesrosiers pointed out that exceptions aren't the only mechanism that can be used to communicate errors. That's what I'm after. What are the error reporting mechanisms for when a schema can't be loaded/processed?

Julian commented 1 year ago

@jdesrosiers pointed out that exceptions aren't the only mechanism that can be used to communicate errors.

Can you point out more specifically where you're referring to and/or what mechanism you're talking about?

gregsdennis commented 1 year ago

https://github.com/json-schema-org/json-schema-spec/issues/1276#issuecomment-1217291231

Julian commented 1 year ago

Got it, I think the point of that comment is already represented in the page but if you see other opportunities once it's more fleshed out obviously let me know.

gregsdennis commented 1 year ago

Are you talking about the "Exception-Driven Validation" section? This is subtly different.

That is using exceptions to give a validation result. This'what I'm talking about is using exceptions to indicate that a validation result could not be determined.

Julian commented 1 year ago

I've added another line to the validation section to give a nod to the above, have another look.

gregsdennis commented 1 year ago

a separate language specific mechanism is generally used for indicating out-of-band error conditions encountered during validation that were not statically detected or detectable; exceptions, error values, or wrapper result types are examples of such mechanisms

I think this is good, but I wonder if it warrants being listed separately as an API feature. Currently it's just listed as a note under "instance validation," but really it's separate from validation.

Julian commented 1 year ago

Is it an interface? To me it doesn't seem like one, and any validation-related interface is likely to also deal with it, which is why I put it where I did.

gregsdennis commented 1 year ago

Is it an interface? To me it doesn't seem like one

Maybe not as such, but it's something that every implemetation will need to have a mechanism for.

any validation-related interface is likely to also deal with it

Yeah, but that's just because it's a global thing. Any implementation will have to deal with it, whether validation, generation, or something else. It's not specifically validation-related.

Julian commented 1 year ago

To me that's a reason to make it less prominent, because it's a general language consideration that when you do things you'd have to consider how your programming language represents errors. I'll have another look to see whether it can go more globally, though yeah honestly to me it doesn't seem like the sort of thing that needs a ton of stressing personally.

gregsdennis commented 1 year ago

Well, we're adding it to the spec for failed reference resolution. To me, it seems like it should be listed.

Julian commented 1 year ago

This isn't meant to be a one stop shop page for all things an implementer needs to consider, it was meant to be about what APIs to offer really. I don't see the relevance personally, but let's see when the page is more filled out.

Julian commented 1 year ago

Inching closer to opening this up wider and/or splitting it off for a PR to the website, there's now a rendered version of the page here.

Greg I gave another clearer-head shot at addressing your suggestion. Let me know if you still think it needs even further discussion. The one thing I feel somewhat strongly about is not giving it an "interface" / heading section given it's not an interface, but other than that I think if it's still unclear we can keep working on it.

Relequestual commented 1 year ago

The new website should be going live on Monday. I think it's fine to turn this work so far into a PR for the new website now.

I think this specific definition needs expanding: https://docs.bowtie.report/en/interfaces/interfaces/#dynamic-uri-resolution - Such that implementers understand they can allow the user to define the resolution process any way they see fit.

On our 1:1 call yesterday, we discussed that a JSON encoded representation of available interfaces could be created by each implementer, and such could be turned into some kind of score. On reflection of the document, I'm now not sure that's appropriate. Some interfaces are essential, while others are optional.

Ideally, with the PR, I'd like to see a data encoded version of the interface definition. Specifically, defined names and if such an interface is required, recommended, or purly optional.

As previously mentioned, follow up work on this issue will be creating a schema, inviting implementers to self report via a JSON file in their repo, and then collecting and aggregating that data in our implementations listings. Out of scope of this specific issue, but good to keep in mind.

benjagm commented 1 year ago

As per json-schema-org/website#158 we have planned a section in the docs for Tutorials that can be a good place to add this documentation to the new website.

Relequestual commented 1 year ago

@Julian Given you/me/we detail in the task for this Issue that the website page will be non-normative, where do you suggest we should house the normative version of such information?

I've thought about this a little and I wonder if it might be appropriate to even create a new repo for defining and holding data related to implementations. We would then have a separation of concern for the data and website, and can then push the data to the website as required, or even pull in the other repo as part of the build process.

If we collected implementation data in the website repo, and it was automatically updating on changes in the repos of implementations, it could get pretty noisy. I feel it could make sense to avoid that.

Plus, we can then (hopefully) use the same data for the "landscape" setup later.

Julian commented 1 year ago

Given you/me/we detail in the task for this Issue that the website page will be non-normative, where do you suggest we should house the normative version of such information?

The only normative document we have at this point is the specification, is it not? Are you asking whether this should go in the spec somewhere? Or simply independently whether we should start a repo for the documentation info?

Relequestual commented 1 year ago

The latter. I don't think it belongs in the specification. Maybe "normative" only applies to specifications. I'm thinking, where does the canonical and authortative version of this live?

I've seen a number of repos elsewhere that gets updated automatically on a daily basis, and that meakes it hard to tell if the thing doing the work has been updated recently. Maybe there should even be two repos... one for the commoninterfaces and other implementation data point definitions, and another for the actual data collection (when we get there).

For now I think having just one new repo would be enough. I feel like it should be able to stand on its own.

Julian commented 1 year ago

I'm still not following I guess, sorry -- when you say "this" -- you don't want the canonical version of the page to be the website? Or you're talking about just where any future data for implementation-specific documentation that we'll pull into the page goes, and creating one new repo to house that?

Relequestual commented 1 year ago

Apologies, you're right, I'm not being clear.

I'm unsure if the canonical version of the page should be in the website repo. I want information about the list of interfaces to be machine readable, including having identifyable canonical code names (like linter rules). I imagine we would encode that all in JSON (of course). I assume it would not be that difficult to re-generate the markdown we have in the website now using the data (that would be the aim).

A repo which housed such a data file, I imagine would be the same repo which houses other similar implemenetaion data, which I expect will initially be self reported, and later scooped from the repo directly. Either way, such data would need a format defined, and I imagine such a definition woul exist in the same repo.

Hopefully that makes sense. I expect that a data file which defines all/most of the common interfaces, including the prose, would be one of the facets that exists in such a repo.

Julian commented 1 year ago

I think we're (all) mixing a bit of things together when it comes to what this page or the broader set of things we want to document for implementers are -- obviously we need to start somewhere with something, so I gave a shot at what I was myself intending at the roadmap summit -- namely a page which at least lays out a bunch of interfaces that implementations I've seen have, may have, or should have (with no real intended "controversial" opinion there).

When writing Bowtie harnesses, I definitely needed to find various equivalents of these interfaces in libraries, and not even having a page defining them makes it hard to even ask their author whether they have one. So I literally had to trawl through docs for many implementations, sometimes finding what I needed, sometimes not finding it.

The same thing seems to happen often in the Slack, where someone will say "I have 2 schemas and they reference each other, how do I use them with this implementation", and we seem to lack vocabulary for even speaking about "well, go find where in your implementation is where you build up a schema registry and provide it to your validation function", which is similar to the above.

On the other hand, other things you're referencing don't seem to me to fit with the page at least not in this initial form, which isn't of course to say we don't need them or couldn't add them to this or a second additional page (nor that I object personally to any of the above even).

But e.g. I definitely wouldn't have thought we'd want to lint for any or all of what's here myself. Whether an implementation has an interface for taking strings vs. language-level objects definitely doesn't seem like a thing I'd use a linter to check, it's just a thing a user of the library needs to know -- do I deserialize JSON myself, or let you the JSON Schema implementation do it for me, or you don't do it for me either and you still want strings? I (now with my Python implementation hat on) would never add such an interface taking strings, it doesn't make any sense in the context of Python. On the other hand, in Go, somehow one or two of the implementations have no way to validate anything but strings, so if you do have deserialized JSON you need to go back to a string to validate it. I don't know the norms in Go, maybe that's what Go developers expect!

Similar things apply to other (or most) of the interfaces I think, I'm not by writing the page saying "you should have all of these", simply "these are things I believe exist in multiple implementations and aren't explicitly bad ideas", and those interfaces are things one needs in conversation when trying to either use or help someone use an implementation, and therefore for implementers they're interfaces they should explicitly think about (e.g. "Do I have this?" "where is my way to do this" or "should I have this?".

So, yeah just making sure that context is there for what I understood this first pass to be about.

It seems as I say likely that you have other or additional things in mind, because otherwise I wouldn't from a technical perspective expect to generate this page from data because it seems like doing so just adds complexity -- I'd perhaps expect to render additional things into it (similar to any page which has partially static and partially dynamic content) -- that would be if we decided to include implementation-specific examples for each of these which is what I thought you were referring to until now. E.g. if you want to show "for the implementation json-schema in clojure, here's how you populate a Schema registry and use it" that likely seems like something you likely want in a repo (all the interfaces + all the implementations' examples for them) and then you take all that data and somehow render all of that into this page -- I certainly know how to do so in other frontend stacks but not in however the website here is architected (meaning I simply haven't looked or learned).

If that (the last paragraph) is indeed what you're referring to then yeah sure I think we could create a repo to start building up that data. It seems like answering how that data ultimately gets rendered into this page and/or whether this page therefore itself gets moved to canonically live somewhere else should then get answered by whoever writes the logic for rendering the two together, I certainly have no personal technical preference there (though if that ends up being me I of course may have one after investigating how one would do so given our website technologies).

Relequestual commented 1 year ago

So, yeah just making sure that context is there for what I understood this first pass to be about.

I agree with all of that. When I mentioned linter rules, I was only refering to how they have distinct names, not that we should be trying to make sure an implementation has all of these interfaces. As you more or less say, that would be silly and unhelpful.

It seems as I say likely that you have other or additional things in mind, because otherwise I wouldn't from a technical perspective expect to generate this page from data because it seems like doing so just adds complexity

For sure. Having implementations self report aspects is going to be the easiest way to maintain correct and up to date data about implementations. This goes beyond just validators, and into all types of implementations, which we can later use to augment the implementations page, and later an ecosystem "landscape" diagram.

If that (the last paragraph) is indeed what you're referring to then yeah sure I think we could create a repo to start building up that data.

I think it's pretty close, yes.

It seems like answering how that data ultimately gets rendered into this page and/or whether this page therefore itself gets moved to canonically live somewhere else should then get answered by whoever writes the logic for rendering the two together

Agreed. I feel like this would be the focus of a new Issue, allowing this one to be closed off when the PR is merged.

Back to, why are we having this discussion...

The Issue https://github.com/json-schema-org/community/issues/498 was created from the third item of the checkbox list in the opening comment of this Issue.

Add the list of common interfaces with some short prose documentation to a new (non-normative) page on the new website

By saying "non-normative", it made me wonder if you had an idea of where a normative version should be. But, I think in stead you meant "this isn't something anyone should interprite as required", which is fine.

All that to say, I think this is all fine, and the PR could close THAT issue, and THIS issue.

However, I now wonder, what did you mean by the last item on the list?

Ensure implementers of new implementations have some way of recognizing they may want to update the interfaces page with information on their implementation when submitting it for inclusion on the implementations page

Julian commented 1 year ago

Ah cool ok sounds like we're indeed closer to being on the same page than it seemed to me before my last comment, cool.

Having implementations self report aspects is going to be the easiest way to maintain correct and up to date data about implementations. This goes beyond just validators, and into all types of implementations, which we can later use to augment the implementations page, and later an ecosystem "landscape" diagram.

Yep, agreed here too, definitely agree with the general idea behind such a repo.

Agreed. I feel like this would be the focus of a new Issue, allowing this one to be closed off when the PR is merged.

+1

But, I think in stead you meant "this isn't something anyone should interprite as required", which is fine.

Yes, indeed, sorry :) words are difficult. All I meant was "and don't force people to implement these by adding it to the spec".

All that to say, I think this is all fine, and the PR could close THAT issue, and THIS issue.

+1

However, I now wonder, what did you mean by the last item on the list?

So here I meant that when someone writes a new implementation they may need to be directed towards this interfaces page -- so we (anyone who merges a PR to the implementations/ page) need to know to add it to our "routine" of things we share with an implementer. That becomes especially the case if they also want to add some stuff to the data repo we're talking about here.

In other words, some new implementer shows up and says "I wrote a new foo jsonschema library". The person who reviews that PR now needs to know to probably share at least 3 things I think:

Hi! foo looks great! You should double check the interfaces page as well to see if you have all of those which make sense for your library. You should also send a PR to this data repo which documents what specifically those interfaces are for your library. And hey consider writing a Bowtie harness too, that's here"

(Maybe even more than 3!)

Obviously we could do the above by adding a PR template to the repo (which had checkboxes for those 3 or something). Or we could also leave it lax and have us reviewers comment with it. The checkbox there was simply reminding me to raise that after we merge the page -- as a Slack comment to all of us, or whatever makes sense (obviously if you have thoughts share em).

Julian commented 1 year ago

How do we feel about a PR template for "I want to add a tool to the implementations page" and to adding this to it (and closing out the last item)? Any objections?

benjagm commented 1 year ago

How do we feel about a PR template for "I want to add a tool to the implementations page"

This will be really cool. This is the issue template for the same purpose in the tools page for OpenAPI: https://github.com/OAI/Tooling/issues/new?assignees=&labels=&template=add_tool_request.md&title=

Relequestual commented 1 year ago

I knew you could create multiple templates for Issues but didn't think you could for PRs. You can! https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository

Only trouble is, unlike with Issues, it doesn't suggest you select one when creating your PR. PR templates can only be used by URL query params. There's a workaround, which almost has the same end result: https://stackoverflow.com/a/75030350/89211

You can create a manual template selection, such that at least developers who open a PR can click on a link to get to their respective template:

Assuming you have two templates group_a_template.md and group_b_template.md under .github/PULL_REQUEST_TEMPLATE

Create the default template .github/pull_request_template.md with the following content
Please go the the `Preview` tab and select the appropriate sub-template:

* [Group A](?expand=1&template=group_a_template.md)
* [Group B](?expand=1&template=group_b_template.md)
In this way, people who open a PR interactively in the UI will first get to the default template and can open their respective target template from the "Preview" view.

Not optimal, but more convenient than patching the URL manually every time you submit a PR.

I'll modify the item in the list to better reflect what we've decided, and copy the above to a new issue.

Julian commented 1 year ago

The other option of course is to simply add an action which comments with those steps whenever something modifies the implementation page.

Relequestual commented 1 year ago

I think that may be preferable actually, as we don't know if the person making the PR is the implementer or not.

Julian commented 1 year ago

Submitted as https://github.com/json-schema-org/website/pull/173!

json-schema-org / community

Identify a list of critical interfaces that implementations should have and empower implementers with better resources #408