hapi-server / data-specification-schema

JSON Schema for HAPI
MIT License
1 stars 0 forks source link

Refactoring tasks #1

Open jbfaden opened 1 year ago

jbfaden commented 1 year ago

Bob, Jon, and Jeremy met to discuss the validation using JSON schema. We resolved to use the existing code within the verifier to do this, though with some changes a standard website could be used. This might be done in the future, but it's not too difficult to use nodejs.

Bob will:

Jeremy will:

Jon will:

rweigel commented 1 year ago

verifier-nodejs contains code that does checks on (1) the API (e.g., supports start and time.min) and CSV data responses (e.g, correct # of columns) and (2) checks on the JSON responses. When you run validate.js, it only checks against the JSON schema. But it should really also do the tests that could not be described with the schema (e.g, if a time varying parameter is given, the parameter is in the dataset).

So this is another action item.

jbfaden commented 1 year ago

Isn't that what verifier-nodejs is for? I guess this is something like "Schematron", which we were just talking about in another meeting, where it goes a bit further than an XML schema does. But I'm concerned that this project is creeping into semmantic verification, where a another language (like nodeJS) is needed to define semmantic checks. We already have the verifier-nodejs project for this.

rweigel commented 1 year ago

I guess my point is that you'll be able to create a bunch of test JSON files in this repo that are JSON schema valid but semantic invalid and you'll have to stand up your server and point the verifier at it before you know.

I anticipate everything being runnable in a web browser.

Yes, many of the semantic tests are something that could be implemented in something like schematron if we were using XML.

rweigel commented 1 year ago

Some thoughts to discuss later today.

Maybe instead of hapi-server.org/verify, there should be

  1. hapi-server.org/verify-metadata (provide a URL, upload a file, or copy text and run schema and semantic tests)
  2. hapi-server.org/verify-server (check all required endpoints present, that API responds with correct error codes, CSV has the correct # of columns, etc.; also do metadata checks)

I have a version of 1. that also works on the command line (the one you've been using). However, it does not have the semantic tests.

Also to discuss, I sometimes think the word "verify" is wrong because the libraries used for schema tests typically state that they do validation. That is, they check if a document is "schema valid". I've been using "verify" due to the typical definition:

or

jbfaden commented 1 year ago

I get the two mixed up when talking about this too. How about "syntactic validation"? This is the validation that any JSON syntax validator can do--it just needs the document and the schema. "Semmantic validation" would require another code (the existing validator) which understands what the syntax means and contains the business logic.

jvandegriff commented 1 year ago

I like the separation of these two concepts. We don't have to exactly align with the software testing nomenclature, but having something close will help people understand.

I think if we have a service offering schema validation, people won't care about the distinction between what can be done via plain vanilla JSON schema validation versus the semantic parts that need more code to do the deeper semantic checks. To outside users, that is all one thing - verifying that the HAPI metadata is OK. We handle that with a two-step process, but that should be transparent to users.

And the verification part then has the more wholistic job of seeing if the server implementation is fully working and all the exchanges are correct for both valid inputs and bogus or even malicious inputs.

So we could still have the verifier web site that does everything, but the it could also offer people a choice to just do one or the other: api-server.org/verify api-server.org/validate-metadata api-server.org/verify-server

rweigel commented 5 months ago

Update:

Here are the additions that I made: https://github.com/hapi-server/data-specification-schema/compare/915b60c..8061f81 (scroll down to find the -3.0.json file).

The updates to 3.0 that you pushed to the master branch were only for bins to be an object with a refs element, and when I implemented your change, it made a valid test dataset I had created a while back no longer validate. As a result, I removed your update to the master branch and started over. I think the only other change you made was a correction to the pattern for HAPI, which I updated manually. I also corrected the older schemas.

Previously, I had allowed an unconstrained object for anything that could be a ref, but you only constrained bins. I had to make many additions, which took a lot of time and testing. I also added basic unit tests: https://github.com/hapi-server/data-specification-schema/blob/main/test/3.0/test.sh, but more work is needed to automate.

I had difficulty finding what you changed in 3.1 in your branch using

git log --follow -p -- HAPI-data-access-schema-3.1.json

It looks like three changes were needed:

https://github.com/hapi-server/data-specification/blob/master/hapi-3.1.0/HAPI-data-access-spec-3.1.0.md#13-additions-to-31

but I found these already in my existing 3.1 JSON schema file in the master branch. Do you recall if I had missed anything? I don't think it will be easy to merge from your branch to the master because of all of the additions that I made to 3.0 and your re-ordering, reformatting, and splitting. There is just too much going on in parallel for me to make sense of. I just committed a 3.1 version of the schema I made by copying the 3.0 schema and manually adding stuff from my old 3.1 JSON schema. I've also made many semantic test additions at https://github.com/hapi-server/verifier-nodejs/commits/master/.

The test datasets I created long ago for 3.1 had much to do with Unicode: https://hapi-server.org/servers/#server=TestData3.1. I think additions related to Unicode did not make it into the list of changes Jon made at https://github.com/hapi-server/data-specification/blob/master/hapi-3.1.0/HAPI-data-access-spec-3.1.0.md#13-additions-to-31. Would someone create a 3.1.1 version of the spec document that includes a discussion of the allowance of Unicode in the change log?

Other notes:

I noticed that we did not address the case where a parameter is multi-dimensional and has vector components in https://github.com/hapi-server/data-specification/blob/master/hapi-3.1.0/HAPI-data-access-spec-3.1.0.md#3610-specifying-vectorcomponents. We need to state if it is allowed.

I can't find dataTest anywhere in your schemas (it was an addition to 3.2 /about. I found it in my draft 3.1 schema, but I think we moved it to 3.2).

coordinateSystemSchema needs work. We only allow spase2.4.1. We need to do better future-proofing.

I was going to upgrade the SSCWeb server to use vector components. I noticed we do not have something that addresses local time (from 0 - 24).

I also noticed https://github.com/hapi-server/data-specification-schema/issues/11

I also need someone to review my semantic checks for units and labels https://github.com/hapi-server/verifier-nodejs/blob/master/lib/checkArray_test.js.

In terms of time, the thing I need help with is reviewing and checking. I'm still finding some fundamental errors in the schema that have existed for a long time. My addition of unit tests has helped reveal them. I don't think automated merging will work. I had difficulty with 3.0->3.1 because there were so many changes. I also need someone to review the test datasets, e.g., https://hapi-server.org/servers/#server=TestData3.0. The idea is that a client should be able to handle responses from TestData2.0 upwards, and only new features appear in the TestDatasets. I use this to test the clients. The main question is if there is enough for a client to test on and be confident it will work for any 3.0 server.

In the docs, we say

The optional stringType object allows servers to indicate that a string parameter has a special interpretation. In general, a string parameter in a dataset has values from an enumerated set, such as status values ("good", "bad", "calibrating") or data classification labels ("flare", "CME", "quiet").

Currently, the only special stringType allowed is a URI.

Why is the first paragraph there?


We need to add:

"1412": {status: 400, "message": "HAPI error 1412: unsupported depth value"},

rweigel commented 5 months ago

diff hapi-3.1.0/HAPI-data-access-spec-3.1.0.md hapi-dev/HAPI-data-access-spec-dev.md