pboettch / json-schema-validator

JSON schema validator for JSON for Modern C++
Other
466 stars 134 forks source link

A URI with 'file' protocol is not handled as it should #252

Open Scanframe opened 1 year ago

Scanframe commented 1 year ago

Problem

When the $id is set to use a file protocol like in this case file:///mnt/server/userdata/source/json-schemas/schema/customer.schema.json an error is reported when other schema files are referenced for definitions.

As a comparison the validator from the Linux package python3-jsonschema only allows file:// protocol for local files which is the most logical in my opinion. (The problem there is that it does not handle relative file paths.)

Directory Structure & Command

Files

<project-dir>
├── json
│   ├── test.customer.json
└── schema
    ├── address.schema.json
    ├── customer.schema.json
    └── defs.schema.json

Command

Both commands are executed when the current directory is the project root.

Python

jsonschema -i json/test.customer.json schema/customer.schema.json

C++ json-schema-validator

json-schema-validate schema/customer.schema.json < json/test.customer.json

Main Schema File

The file below references other files. Those files can be found at this location.

{
    "$id": "file:///mnt/server/userdata/source/json-schemas/schema/customer.schema.json",
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "additionalProperties": false,
    "properties": {
        "name": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
                "first": {
                    "$ref": "defs.schema.json#/definitions/firstName"
                },
                "middle": {
                    "$ref": "defs.schema.json#/definitions/middleName"
                },
                "last": {
                    "$ref": "defs.schema.json#/definitions/lastName"
                }
            },
            "required": [
                "first",
                "middle",
                "last"
            ]
        },
        "shipping_address": {
            "$ref": "address.schema.json"
        },
        "billing_address": {
            "$ref": "address.schema.json"
        },
        "parcel_size": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
                "height": {
                    "$ref": "defs.schema.json#/definitions/parcelSizeHeight"
                },
                "width": {
                    "$ref": "defs.schema.json#/definitions/parcelSizeWidth"
                },
                "depth": {
                    "$ref": "defs.schema.json#/definitions/parcelSizeDepth"
                }
            }
        }
    },
    "required": [
        "name",
        "shipping_address",
        "billing_address",
        "parcel_size"
    ]
}

Error Log

setting root schema failed
could not open file:///mnt/server/userdata/source/json-schemas/schema/address.schema.json tried with .//mnt/server/userdata/source/json-schemas/schema/address.schema.json
ERROR: '"/billing_address"' - '{"city":"'s-Gravenhage","postal_code":"2514GL","state":"Zuid-Holland","street_address":"Noordeinde 68"}': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/address.schema.json # 
ERROR: '"/name/first"' - '"Prins"': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/defs.schema.json # /definitions/firstName
ERROR: '"/name/last"' - '"Oranje"': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/defs.schema.json # /definitions/lastName
ERROR: '"/name/middle"' - '"van"': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/defs.schema.json # /definitions/middleName
ERROR: '"/parcel_size/depth"' - '30': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/defs.schema.json # /definitions/parcelSizeDepth
ERROR: '"/parcel_size/height"' - '200': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/defs.schema.json # /definitions/parcelSizeHeight
ERROR: '"/parcel_size/width"' - '80': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/defs.schema.json # /definitions/parcelSizeWidth
ERROR: '"/shipping_address"' - '{"city":"'s-Gravenhage","postal_code":"2513BJ","state":"Zuid-Hoilland","street_address":"Molenstraat 27"}': unresolved or freed schema-reference file:///mnt/server/userdata/source/json-schemas/schema/address.schema.json # 
schema validation failed
pboettch commented 1 year ago

The validator program you're using is just a test program, an example showing how to use the library.

The simple loader-callback actually is doing a good work, because it uses the URL-path of the root-schema to find the other sub-schemas.

If you use the library please write your own loader-script matching your infrastructure.

To solve your problem validator library needs to be aware of the initial filename and path of the root-schema. As of today it isn't. It seems the python one is doing that.

If you don't want to integrate the library in your program and just want to use an executable, why not stick with the python one?

Otherwise, do not hesitate to suggest a patch for the example so that it does what you want.

Btw. Isn't it very strange that the $id-tag contains a local file path?

Scanframe commented 1 year ago

Thanks for responding.

Btw. Isn't it very strange that the $id-tag contains a local file path?

When your system/application has no access to webservers then this is the only option.

I assumed the $id-tag in the main schema can only contain a URI to identify its resource. The other linked or referenced schemas can use a relative location to the main one.

I tried fix it in the code but a path is prefixed with ./ which is good for the http protocol but not for the file protocol. It became too complex from there to figure out what to change in a short time to make it work.

pboettch commented 1 year ago

Btw. Isn't it very strange that the $id-tag contains a local file path?

When your system/application has no access to webservers then this is the only option.

No, it seems common usage to put http-addresses as $ids, even though nothing is looking up anything on the internet.

I tried fix it in the code but a path is prefixed with ./ which is good for the http protocol but not for the file protocol. It became too complex from there to figure out what to change in a short time to make it work.

The library also does not support ../-relative path references. This might be related. Someone with time needs to take a look.

Scanframe commented 1 year ago

No, it seems common usage to put http-addresses as $ids, even though nothing is looking up anything on the internet.

My understanding is it when the $id is omitted the from the main schema file secondary referenced schema files are not found at all. The $id sets the location where the other schema files are to be found. When using only a single schema file nothing in the $id tag matters since nothing is externally referenced.

pboettch commented 1 year ago

The other schema-validators I saw all use callbacks for the user to handle the loading of additional schemas. So, it's up to the application handling the evaluation of the URL of $id.

The problem you have is, that file:// is not (correctly) handled in the URL-class (probably).

OK, but you are also using an example program which is not really designed to be generic. Maybe we can fix it there? In the loader callback, if the protocol is file, we remove the .?

Scanframe commented 1 year ago

OK, but you are also using an example program which is not really designed to be generic. Maybe we can fix it there? In the loader callback, if the protocol is file, we remove the .?

I can make a contribution trying to fixing it.

BTW...

I used FetchContent_xxxxx CMake functions instead of the Hunter ones. CMake V3.11 is needed for it at least.

file: cmake/nlohmann_jsonConfig.cmake

# FetchContent added in CMake 3.11, downloads during the configure step
include(FetchContent)
# Import Json library.
FetchContent_Declare(
    nlohmann-json
    GIT_REPOSITORY https://github.com/nlohmann/json
    GIT_TAG v3.8.0
    )
# Adds nlohmann_json::nlohmann_json
FetchContent_MakeAvailable(nlohmann-json)

Addition in main CMakeLists.txt

# Make it so our own packages are found and also the ones in the sub-module library.
list(APPEND CMAKE_PREFIX_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake")