pboettch / json-schema-validator

JSON schema validator for JSON for Modern C++
Other
512 stars 145 forks source link

Error on validating a json schema that uses date-time format #56

Closed Gomox11 closed 4 years ago

Gomox11 commented 5 years ago

Hello!

The json schema I want to validate against uses the date-time format.

"currentTime": {
      "type": "string",
      "format": "date-time"
}

On validation the validator throws an error:

"At /currentTime of "2019-04-18T09:28:03+00:00" - a format checker was not provided but a format keyword for this string is present: date-time"

Is there any way to make the validator work with the date-time format? Any workarounds? I know that the format is defined in the standard but not required to be implemented by the validators.

Thanks in advance :)

pboettch commented 5 years ago

You can provide a format-checker yourself. Via the format-check-function passed to the constructor of the validator-class.

An exemple can be found here:

https://github.com/pboettch/json-schema-validator/blob/master/test/JSON-Schema-Test-Suite/json-schema-test.cpp#L19

and its usages:

https://github.com/pboettch/json-schema-validator/blob/master/test/JSON-Schema-Test-Suite/json-schema-test.cpp#L92

And I immediately see a problem regarding the usage of exceptions... I will dig into it.

Gomox11 commented 5 years ago

Thank you very much for your quick answer :)

That looks exactly like what I need! Have you thought about documenting that somewhere on the main Github page? Seems like it could concern a lot more people. (Or I was just too blind to find it :D)

Now I'll just have to look if somebody has already written a date-time format check or if I have to implement one myself.

Many thanks :)

pboettch commented 5 years ago

The problem is that checkers are not really platform-independent. Depending on the platform you could implement it differently; especially by using external libraries or not.

The last point is the reason for not having delivered with this library (think of checking an URL, super hard IMO).

But, of couse checkers could be implemented in a generic way and thus be integrated here.

date-time (using strptime()) could be added in a generic manner. I would be happy to see your contribution.

mxmlnkn commented 5 years ago

I strongly think that this should be supported out-of-the-box by this JSON checker for it to be called compatible as it is a specification built-in. I don't think it was the idea of the test suite to manually add 30 lines of hardcore regex code to pass it. This is exactly the complexity that should be hidden inside a library.

I don't see the point for the platform-independence. The implementation you linked uses std::regex, which should be platform-independent. And if there was platform dependence, then it should be a libraries job to hide it.

Ok, the URL checker is a pretty good example. But then again, JSON only has the built-in URI (of which URL is only a subset of) format specifier, which looks to be considerably more difficult. But speaking generally, all format specifiers have links to RFC documents which contain formal definitions often in ABNF, which should aid in writing a validator. There even exists an ABNF to Regex converter.

As for date-time, using the ABNF from the linked RFC3339 and using the online converter to case-sensitive regex for the date-time rule, I get:

^[0-9]{4,4}\-[0-9]{2,2}\-[0-9]{2,2}t[0-9]{2,2}\:[0-9]{2,2}\:[0-9]{2,2}(\.[0-9]+){0,1}(z|((\+|\-)[0-9]{2,2}\:[0-9]{2,2}))$

for the copy-pasted ABNF:

date-fullyear   = 4DIGIT
date-month      = 2DIGIT  ; 01-12
date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                         ; month/year
time-hour       = 2DIGIT  ; 00-23
time-minute     = 2DIGIT  ; 00-59
time-second     = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap second
                         ; rules
time-secfrac    = "." 1*DIGIT
time-numoffset  = ("+" / "-") time-hour ":" time-minute
time-offset     = "Z" / time-numoffset

partial-time    = time-hour ":" time-minute ":" time-second
                 [time-secfrac]
full-date       = date-fullyear "-" date-month "-" date-mday
full-time       = partial-time time-offset

date-time       = full-date "T" full-time

The ABNF end-of-line comments and the restrictions have to be implemented manually though. So, starting from this and correcting some bugs and using capturing and non-capturing groups for value extraction, I'd end up with this checker:

// g++ --std=c++11 datetime.cpp && ./a.out

#include <iostream>
#include <exception>
#include <regex>
#include <sstream>
#include <string>
#include <utility>
#include <vector>

template<typename T>
void
rangeCheck( const T value, const T min, const T max )
{
    if ( !( ( value >= min ) && ( value <= max ) ) ) {
        std::stringstream out;
        out << "Value " << value << " should be in interval [" << min << "," << max << "] but is not!";
        throw std::invalid_argument( out.str() );
    }
}

namespace {
void formatCheck( const std::string& format, const std::string& value )
{
    if ( format == "date-time" ) {
        const static std::regex dateTimeRegex{ R"(^([0-9]{4})\-([0-9]{2})\-([0-9]{2})T([0-9]{2})\:([0-9]{2})\:([0-9]{2})(\.[0-9]+)?(?:Z|((?:\+|\-)[0-9]{2}\:[0-9]{2}))$)" };

        std::smatch matches;
        if ( !std::regex_match( value, matches, dateTimeRegex ) ) {
            throw std::invalid_argument( value + " is not a date-time string according to RFC 3339." );
        }

        const auto year = std::stoi( matches[1].str() );
        const auto month = std::stoi( matches[2].str() );
        const auto mday = std::stoi( matches[3].str() );
        const auto hour = std::stoi( matches[4].str() );
        const auto minute = std::stoi( matches[5].str() );
        const auto second = std::stoi( matches[6].str() );
        // const auto secfrac = std::stof( matches[7].str() );
        // const auto timeNumOffset = matches[8].str();

        const auto isLeapYear = ( year % 4 == 0 ) && ( ( year % 100 != 0 ) || ( year % 400 == 0 ) );

        rangeCheck( month, 1, 12 );
        if ( month == 2 ) {
            rangeCheck( mday, 1, isLeapYear ? 29 : 28 );
        } else if ( month <= 7 ) {
            rangeCheck( mday, 1, month % 2 == 0 ? 30 : 31 );
        } else {
            rangeCheck( mday, 1, month % 2 == 0 ? 31 : 30 );
        }
        rangeCheck( hour, 0, 23 );
        rangeCheck( minute, 0, 59 );
        rangeCheck( second, 0, 60 );
    } else {
        throw std::logic_error( "don't know how to validate " + format );
    }
}
}

int main( int, char** )
{
    std::vector<std::pair<std::string, bool> > dateTimeChecks{
        { "1985-04-12T23:20:50.52Z"     , true  },
        { "1996-12-19T16:39:57-08:00"   , true  },
        { "1990-12-31T23:59:60Z"        , true  },
        { "1990-12-31T15:59:60-08:00"   , true  },
        { "1937-01-01T12:00:27.87+00:20", true  },
        { "1985-4-12T23:20:50.52Z"      , false },
        { "1985-04-12T23:20:50.52"      , false },
        { "1985-04-12T24:00:00"         , false },
        { ""                            , false },
        { "2019-04-30T11:11:11+00:01"   , true  },
        { "2019-04-31T11:11:11+00:01"   , false },
        { "2019-02-28T11:11:11+00:01"   , true  },
        { "2019-02-29T11:11:11+00:01"   , false },
        { "2020-02-29T11:11:11+00:01"   , true  },
        { "2020-02-30T11:11:11+00:01"   , false },
        { "2020-02-29T23:59:59+00:01"   , true  },
        { "2020-02-29T23:59:60+00:01"   , true  },
        { "2020-02-29T23:60:59+00:01"   , false },
        { "2019-09-30T11:11:11+00:01"   , true  },
        { "2019-09-31T11:11:11+00:01"   , false }
    };

    for ( auto dateTimePair = dateTimeChecks.begin(); dateTimePair != dateTimeChecks.end(); ++dateTimePair ) {
        std::cout << "[INFO] Testing date time: " << dateTimePair->first << "\n";

        try {
            formatCheck( "date-time", dateTimePair->first );

            if ( !dateTimePair->second ) {
                std::cerr << "[ERROR] Date time string '" << dateTimePair->first << "' validated even though it should NOT!\n";
            }
        } catch ( std::exception& exception ) {
            std::cout << "[INFO] Validation failed with: " << exception.what() << "\n";
            if ( dateTimePair->second ) {
                std::cerr << "[ERROR] Date time string '" << dateTimePair->first << "' did NOT validate even though it should!\n";
            }
        }
    }

    return 0;
}

Note that compiling regex objects is quite slow in C++! That's why it was done as a static variable.

pboettch commented 5 years ago

Great stuff. Let's get it into the library. We could add checkers one by one. And leave the format-checker-callbback for user-formats and for format not internally supported.

Including your test-case.

Mind to make a PR?

mxmlnkn commented 5 years ago

Ok, I can move the above code and the format checkers from the test.cpp into this line.

mxmlnkn commented 5 years ago

While looking for a more robust ABNF to RegEx convert I actually found a regex for uri-reference here in the readme. But I did not add it as it looks a bit too extreme. Unfortunately, I didn't get it to work on my system; maybe I need Ruby 1.7 even though only 2.9 is in my repositories. Maybe it would be a better idea to add another dependency to have a direct ABNF matcher in C++ although that particular one is in GPL-2.0.

Edit: Ready-made RegExes

garethsb commented 5 years ago

As far as I remember, the JSON Schema specification allows or requires implementations to ignore formats that they don't recognize. Therefore since several of the formats are (a) expensive to implement, (b) really hard to get right (there are several competing definitions of hostname for example), I think the default format checker should do nothing, rather than throw an exception. Having a library implementation of the common formats which can be optionally be used would obviously be a great idea however!

FWIW, my current format checker is here: https://github.com/sony/nmos-cpp/blob/master/Development/cpprest/json_validator_impl.cpp#L21-L76

garethsb commented 5 years ago

As far as I remember, the JSON Schema specification allows or requires implementations to ignore formats that they don't recognize.

See https://json-schema.org/latest/json-schema-validation.html#rfc.section.7.2.

Implementations MAY support the "format" keyword as a validation assertion. Should they choose to do so:

  • they SHOULD implement validation for attributes defined below;
  • they SHOULD offer an option to disable validation for this keyword.

Implementations MAY add custom format attributes. Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support this keyword and/or custom format attributes.