opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
837 stars 57 forks source link

Add basic valid and invalid tests for the json schema #141

Closed jorisvandenbossche closed 1 year ago

jorisvandenbossche commented 2 years ago

Closes https://github.com/opengeospatial/geoparquet/issues/135 (rework of https://github.com/opengeospatial/geoparquet/pull/134 to focus on json schema for now)

jorisvandenbossche commented 2 years ago

cc @m-mohr do you think this is a sensible approach to test the json schema? (ensure it passes when it should and errors when it should) Feedback would be very welcome (I never did this before)

m-mohr commented 2 years ago

I'll have a look in the next days :-)

jorisvandenbossche commented 2 years ago

OK, I updated this PR for the latest changes in the spec and updated to use pytest. This should be ready for review.

m-mohr commented 1 year ago

Looks reasonable to me. Why commit the test files when you can always generate them on the fly though? May lead to PRs where people update them by hand...

m-mohr commented 1 year ago

I'd like to propose some additional checks:

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["invalid_column_object"] = {"foo": "bar"}
invalid_cases["invalid_column_object"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["bbox"] = [0,0,0]
invalid_cases["3_element_bbox"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["bbox"] = [0,0,0,0]
valid_cases["4_element_bbox"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["bbox"] = ["0","0","0","0"]
invalid_cases["bbox_invalid_type"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["bbox"] = [0,0,0,0,0]
invalid_cases["5_element_bbox"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["bbox"] = [0.0,0.0,0.0,0.0,0.0,0.0]
valid_cases["6_element_bbox"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["bbox"] = [0,0,0,0,0,0,0]
invalid_cases["7_element_bbox"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"][""] = metadata["columns"]["geometry"]
invalid_cases["empty_column_name"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["primary_geometry"] = ""
invalid_cases["empty_primary_geometry"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["geometry_types"] = ["Point", "Point"]
invalid_cases["geometry_type_uniqueness"] = metadata

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["geometry"]["geometry_types"] = ["PointZ"]
invalid_cases["geometry_type_missing_space"] = metadata

~Some of these may fail with the existing JSON Schema and may need to be added in #131, where we can fix this. I could also add them all in #131.~ Now that #131 was merged, you should be able to just add them here.

jorisvandenbossche commented 1 year ago

@m-mohr thanks for those additional cases! This one I don't fully understand what it is testing for:

metadata = copy.deepcopy(metadata_template)
metadata["columns"]["primary_geometry"] = ""
invalid_cases["empty_primary_geometry"] = metadata
m-mohr commented 1 year ago

@jorisvandenbossche In #153 we require a minimum length of 1 for the primary geometry and I'm trying to check whether the schema fails for an empty primary geometry.

But I just now realize that it's wrong and should be:

metadata = copy.deepcopy(metadata_template)
metadata["primary_geometry"] = ""
invalid_cases["empty_primary_geometry"] = metadata
jorisvandenbossche commented 1 year ago

OK, that one is already included!

tschaub commented 1 year ago

Would there be support for moving the test_json_schema.py script to the scripts directory? It feels like that directory has the most complete story around managing python dependencies.

jorisvandenbossche commented 1 year ago

I wouldn't mind that, certainly now it is only a single file (I am also already making use of the python dependencies that are installed for the scripts to run the tests in CI)

tschaub commented 1 year ago

I moved this script in #159 and updated it to read the version constant from the schema file. So the tests now assert that metadata without a version or with a version that is different than the schema version is invalid.