opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.44k stars 634 forks source link

Intra-blob JSON validation correctness and image-spec release policy #406

Open wking opened 7 years ago

wking commented 7 years ago

We're moving towards keeping all intra-blob JSON validation in this repo since this discussion and @stevvooe has been redirecting intra-blob JSON validation PRs from image-tools to this repo. But working up #358 and #404 has left me concerned about how well our JSON Schema covers our spec. And JSON Schema is currently the only validation we do here, although see #403 about preparing to do more. With that intra-blob JSON validation happening in image-spec, we have to do at least one of:

a. Block image-spec releases until intra-blob JSON validation is robust enough for us to use it for certification. b. Cut image-spec releases when the Markdown is ready, and then cut patch releases in a 1.0.x branch (or something) as the validation improves. And probably port those validation fixes between the 1.0.x branch (or whatever) and the master branch. c. Cover any errors or deficiencies from the released image-spec intra-blob JSON validation with image-tool intra-blob JSON validation. For example, if we'd cut 1.0 without something like #404 landing, image-tools would have to land something like opencontainers/image-tools#45 to cover.

None of these sound particularly appealing to me, but as long as validation (even if it's just JSON Schema validation) lives in image-spec, my personal preference is (a). Do we want to form an official policy on how we'll handle this before we cut 1.0?

vbatts commented 7 years ago

@wking I read this, and am not real clear on the point? I'm inclined to close this as I feel like I haven't heard how it's critical for v1 considering.

wking commented 7 years ago

On Thu, Mar 09, 2017 at 08:01:24AM -0800, Vincent Batts wrote:

@wking I read this, and am not real clear on the point?

I think we need to have a maintenance plan for intra-blob validation before we cut a 1.0 spec release. The current approach seems to be (b), except without backporting JSON Schema fixes, or patch releases in the spec repo. Unless something about that changes, I think we're going to have maintenance problems in image-tools.

I'm inclined to close this as I feel like I haven't heard how it's critical for v1 considering.

If we want to punt on figuring out our intra-blob validation maintenance policy, I guess we can. But I'd feel more comfortable if we had a plan ;).

vbatts commented 7 years ago

again, you just restated the same kind of words with out clarifying the issue you're seeing. I'm not clear.

wking commented 7 years ago

On Thu, Mar 09, 2017 at 10:42:00AM -0800, Vincent Batts wrote:

again, you just restated the same kind of words with out clarifying the issue you're seeing. I'm not clear.

Say we close this issue without establishing a policy. Then we cut 1.0. Then we find a bug in the JSON Schema, or the wrapping Go, or some other non-spec validation tooling. What happens next? There are a number of choices, and I think we should think it over and at least have an initial plan before we find ourselves in that situation.

Once we do have a plan, image-spec can adjust to be ready for it (e.g. by not using the spec's declared version in 1 if it might end up using a *-dev release of the spec to get correct validation for the non-dev 1.0.0 release).

xiekeyang commented 7 years ago

@vbatts We might need more validation on JSON objects, besides schema/*.json, such as how to handle dis-matched result, and should valid child fields media-type if exist. Current schema/validator.go take some validation work, but seems not enough. We have not a clear plan or idea for this.

jonboulle commented 7 years ago

I think wking's position can be summarised as concern that we're not appropriately testing the json schema that we're shipping, and so it's quite possible (probable?) that whatever we call "1.0" actually has a divergence between the spec-as-markdown and spec-as-json-schema. Is that accurate?

wking commented 7 years ago

Is that accurate?

Yes, and my concern is not just for the JSON Schema. Everything from this repo which is consumed by image-tools but not part of the Markdown spec is in the same boat.

For example, if the intra-blob validation doesn't grow a violated-SHOULD-warning API until 1.1, it will make it hard to warn/error on those when image-tools is validating a 1.0 image. You'd need one of the a/b/c approaches, or support for multiple versions in image-spec.

xiekeyang commented 7 years ago

I remember we had discussed about the framework on validation, here I list them roughly, to help us for next improvement on image-spec and image-tools. Welcome forkers to add and correct.

  1. As previous suggestion, schema JSON had better to be dropped, instead to use golang directly. That mean schema.Validator should be reconstructed.
  2. Auto-detection of media type of JSON object should not be used any more. Consumer should point out the media they want to find out or validate.
  3. How to handle intra-blob validation, error or warn.

cc @stevvooe @wking @vbatts

wking commented 7 years ago

On Mon, Mar 13, 2017 at 04:06:14AM -0700, xiekeyang wrote:

  1. As previous suggestion, schema JSON had better to be dropped, instead to use golang directly. That mean schema.Validator should be reconstructed.

This gets raised periodically, but I'm skeptical. JSON Schema is a domain-specific language designed for exactly this purpose, and trying to implement the same checks directly in Go seems like a lot more work than it would be worth. And I don't think it impacts the release process anyway, since the intra-blob validation API provided by image-spec should be independent of the implementation (JSON Schema or otherwise).

  1. Auto-detection of media type of JSON object should not be used any more. Consumer should point out the media they want to find out or validate.

I don't think anyone is arguing against this, and all current proposals I'm aware of for image-spec APIs involve explicit media type declarations.

  1. How to handle intra-blob validation, error or warn.

This is definitely going to impact the validation API. With you're giving up on #452 and #403 being closed, I think we're just waiting on the maintainers to show us what they want. However, each of the release policies in 1 would apply to that sort of API change as well. It's not clear to me if the image-spec maintainers intend to block 1.0 on stabilizing that API (policy (a)) or if they intend to cut 1.0 before landing that API (policies (b) and (c)).

stevvooe commented 7 years ago

This gets raised periodically, but I'm skeptical. JSON Schema is a domain-specific language designed for exactly this purpose, and trying to implement the same checks directly in Go seems like a lot more work than it would be worth. And I don't think it impacts the release process anyway, since the intra-blob validation API provided by image-spec should be independent of the implementation (JSON Schema or otherwise).

I think we have months of evidence proving the contrary. We have had JSON schema in place for some time and yet it has missed a number of edge cases. Here is an example where we have to implement secondary validation. I am not sure if we are using gojsonschema in a non-optimal manner or there are other problems but a solid piece of Go code could have shipped already.

It also looks like we either need to leverage `gojsonschema's error types or specify something on our own. The goal here would be to have the validator return all errors, then let the consumer decide which ones matter.

xiekeyang commented 7 years ago

This is definitely going to impact the validation API. With you're giving up on #452 and #403 being closed, I think we're just waiting on the maintainers to show us what they want.

I close #452 partly because I know maintainers didn't achieve ideal agreement yet, partly because I feel it is not very graceful.

Actually we are mostly talking about validator especially on edge case. we almost agree it SHOULD be only for OCI image JSON objects, and MUST cover all of them, including each of intra items.

I had tried to use pure Go instead gojsonschema, but it is really so big effort. we have to implement secondary validation, maybe yes. But we should reform validator, if condition and function map looks a little ugly.

About validator API, we maybe need more input, such as reader, which kind of type (consumer should be able to input it directly), and mode for intra items(like customized or canonical).

It also looks like we either need to leverage `gojsonschema's error types or specify something on our own. The goal here would be to have the validator return all errors, then let the consumer decide which ones matter.

Agree.

wking commented 7 years ago

On Thu, Mar 09, 2017 at 10:50:13AM -0800, W. Trevor King wrote:

Say we close this issue without establishing a policy. Then we cut 1.0. Then we find a bug in the JSON Schema, or the wrapping Go, or some other non-spec validation tooling. What happens next?

It's not documented in this repository (more on this in #713), but with 1.0.0, @vbatts excluded schema from semantic versioning:

While the schema and validation (./schema/) may continue to iterate and improve, this release specifically covers specification documents (pdf, html, or ./*.md) and reference golang source (./specs-go/).

That leaves (b) and (c) as the only tenable approaches. Whether (b) remains tenable depends on how robustly image-spec decides to support intra-blob validation in patch releases.

sudo-bmitch commented 6 days ago

I'm looking through old issues to see what needs cleaning. Is this still an issue for anyone in the thread? If so, it would help to have a definition of what "inter-blob validation" means in this issue (perhaps I missed it, or it was in one of the links).