open-contracting / extension_registry

A registry of extensions to the Open Contracting Data Standard
Apache License 2.0
0 stars 3 forks source link

New Extension Registry: validation of extensions.csv #92

Closed odscjames closed 6 years ago

odscjames commented 6 years ago

This is a spin off from #90

Note this includes validation that can be done on extensions.csv only - there is additional validations we can do on other files but that's a different issue.

Currently there is the following validations applied to extensions.csv

What other validations could be applied?

As for the method of applying validation, currently validations is done by throwing an exception in ocdsextensionregistry/validate.py or in the validate_extension_registry_data_only method of ocdsextensionregistry.models.ExtensionModel. In ocdsextensionregistry/tests/test_validate.py there are pytests that are already run by Travis.

We could develop new validations directly in Python, or we could also try the technique of converting a line to a JSON and then using JSON Schema. I suspect that in many cases Python would be the simpler or more powerfull route but I note that both can by tested by pytest in the same way as current validations. The important thing is to agree on the criteria - as long as there are some tests I'm not to fussed about the technique.

jpmckinney commented 6 years ago

I prefer using JSON Schema (restoring parts of entry-schema.json), because we use JSON Schema everywhere else, and it's straight-forward to re-implement (use a DictReader and then validate each row with a JSON Schema validator, as done here), rather than doing some bespoke Python methods.

Also, we should be strict about the format of booleans (true or false). There's no need to support a wide variety or truthy or falsy values – and we can remove the utility code that implements that flexibility.

@odscjames Is this something you can take forward in the short term?

odscjames commented 6 years ago

Can I just confirm criteria - all the points I listed in "What other validations could be applied?" are things you think we should be doing?

odscjames commented 6 years ago

Also is there any other criteria you can think of to add to that list?

jpmckinney commented 6 years ago

Yes, the validations should be as strict as possible - I may think of other criteria once I see a PR. However, at a later time, we will want to relax the dependency on GitHub, so we don't need to validate GitHub URL formats.

jpmckinney commented 6 years ago

Closed via #99 🎉