Closed b5 closed 4 years ago
Thanks for creating an issue for this one.
Besides the missing implementation, what would be the semantics in case of a CSV file? Probably not that the whole column is missing. And for individual fields of a row, is an empty string equivalent to a missing value? Would validation rules be applied in this case?
@b5 required
has been an array value since draft-04, listing the names of the required properties in a given object. That example uses it correctly already.
@b5
required
has been an array value since draft-04, listing the names of the required properties in a given object. That example uses it correctly already.
This is correct.
@hbruch you can refer to this for how the validation should be set up for required fields: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/c09e3a697b4aaca2d6376156cc59d825ef2f39be/tests/draft4/enum.json#L47
To answer your question slighly further, internally we still use a JSON shape to represent the dataset and validate it. It runs on the "cell" level for the data itself. For individual fields, the required
flag only checks for "value not present" which means even ""
will return true.
I'll close this issue for now as this is behaving per spec, but feel free to continue the discussion if you have further concerns with this approach.
Thanks @Arqu for looking into this. Just to document it here for other searching for a way to define such a validation: my intended usage was to to avoid validation failures for empty ("") CSV columns. As you write, "" is still present and validation will be applied for this value. I finally defined the field like this:
{
"title": "lat",
"type": ["number", "string"],
"anyOf": [
{"type": "number", "maximum": 90, "minimum": -90},
{"type": "string", "enum": [""]}
]
}
It would be great if you could comment on this, if there exists a more idiomatic definition. Besides this, I'm fine with closing this issue.
Ah, still another question regarding this validation: Could it cause issues later on, when e.g. SQL queries try to deduce column types and fail on such multi-type columns? How should null values be represented then?
Thanks @Arqu for looking into this. Just to document it here for other searching for a way to define such a validation: my intended usage was to to avoid validation failures for empty ("") CSV columns. As you write, "" is still present and validation will be applied for this value. I finally defined the field like this:
{ "title": "lat", "type": ["number", "string"], "anyOf": [ {"type": "number", "maximum": 90, "minimum": -90}, {"type": "string", "enum": [""]} ] }
It would be great if you could comment on this, if there exists a more idiomatic definition. Besides this, I'm fine with closing this issue.
Considering you wan't to allow empty fields and pass validation, I think this is a good approach. Not sure if there's a much better way other than starting to make a custom version of the validator. Maybe @b5 can add some input here.
Ah, still another question regarding this validation: Could it cause issues later on, when e.g. SQL queries try to deduce column types and fail on such multi-type columns? How should null values be represented then?
Good question! Currently for SQL we use Octosql which does a lot of the heavy lifting. We have some pre-processing on our end but most of that is oblivious to the execution process and data mapping. The pre-processing is mostly interested in getting the qualified names and resolving them (ie datasets and fields) with everything else being pushed down to the engine. On top of that Octo also cares much less about the shapes of data as it mostly relies on knowing which type of primitive it is to do it's work. There's no advanced schema validations like here which we need to account for.
There was the idea to leverage our schema meta data together with SQL but would lead to a lot of problems like what you are just describing. So for now we shouldn't have those issues (though it's still experimental and we have lots of other issues).
@Arqu you can use const
instead of a single-value enum
, which makes it a little more obvious that you are requiring an exact value. Same effect, but the intent is more clear.
Thanks @handrews, makes perfect sense!
Steps to replicate:
ExampleBasic
fromschema_test.go
, add"required": false
to the "age" section of the schemago test --run ExampleBasic
goroutine 1 [running]: testing.(InternalExample).processRunResult(0xc00019fd58, 0x0, 0x0, 0x4a5e2, 0x13065c0, 0xc000099cd0, 0xc00019fa00) /usr/local/Cellar/go/1.14/libexec/src/testing/example.go:89 +0x63f testing.runExample.func2(0xbf98a40a40876428, 0x14e094, 0x1618dc0, 0xc0000a2088, 0xc0000a2008, 0xc00008e1e0, 0xc00019fd58, 0xc00019fd88) /usr/local/Cellar/go/1.14/libexec/src/testing/run_example.go:58 +0x105 panic(0x13065c0, 0xc000099cd0) /usr/local/Cellar/go/1.14/libexec/src/runtime/panic.go:967 +0x15d github.com/qri-io/jsonschema.ExampleBasic() /Users/b5/qri/jsonschema/schema_test.go:43 +0x4d5 testing.runExample(0x137d84e, 0xc, 0x1396b60, 0x1390b10, 0x75, 0x0, 0x0) /usr/local/Cellar/go/1.14/libexec/src/testing/run_example.go:62 +0x200 testing.runExamples(0xc00019fee8, 0x1612ee0, 0x2, 0x2, 0x100) /usr/local/Cellar/go/1.14/libexec/src/testing/example.go:44 +0x1a8 testing.(M).Run(0xc0000f8080, 0x0) /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:1201 +0x1ec main.main() _testmain.go:74 +0x135 exit status 2 FAIL github.com/qri-io/jsonschema 0.276s