inveniosoftware / datacite

Python API wrapper for the DataCite API.
https://datacite.readthedocs.io
Other
27 stars 33 forks source link

Investigate whether format verification is working #76

Open tmorrell opened 1 year ago

tmorrell commented 1 year ago

Package version (if known): v1.1.2

Describe the bug

The date format verification https://github.com/inveniosoftware/datacite/blob/b15db91e1231135e5f2dba0cadca0c72ede037cf/datacite/schemas/datacite-v4.3.json#L80 may not be used by default https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-formats or be implemented correctly.

edager commented 7 months ago

Hi @tmorrell,

Although it's allowed to use format for notational purposes. However if it should actually validate the format, a validator needs to be supplied while validating. As far as I can tell, this is not happening, which in principle means that e.g. "publicationYear" is only being validated as being a string (see dummy example below). It might be useful to have a regex instead of these formats that aren't properly validated?

from jsonschema import validate

schema = {
    "additionalProperties": False,
    "type": "object",
    "properties": {
        "publicationYear": {
            "type": "string",
            "format": "year"
        }
    },
    "required": [
        "publicationYear"
    ]
}

tests = [
    {"publicationYear":"1234"},
    {"publicationYear":"year"},
    {"publicationYear":""},
]
for test in tests:
    if validate(instance=test, schema=schema) is None:
        print(f"{test['publicationYear']} is a valid year format")
tmorrell commented 2 months ago

This is fixed in the 4.5 schema