jhthorsen / json-validator

:cop: Validate data against a JSON schema
https://metacpan.org/release/JSON-Validator
57 stars 59 forks source link

Where/How to implement date-time format coercer? #48

Closed ugexe closed 7 years ago

ugexe commented 7 years ago

I'm looking for guidance on the best place to put date-time coercion logic. My guess is that it would belong in a JSON::Validator::OpenAPI subclass - or possibly JSON::Validator::OpenAPI itself since Mojolicious is a PREREQ_PM, supplying Mojo::Date to handle date logic without brining in DateTime?

A naive example of the coercer could be:

# monkey_patch only to install a hyphenated method name for later call to $self->$openapi_type_name(...)
monkey_patch(__PACKAGE__,
    '_validate_date-time' => sub {
        my ($self, $value, $path, $schema, $expected) = @_;
        return $self->_is_date_time($value) if !$self->{coerce}{"date-time"} or ($value = Mojo::Date->new($value)->to_datetime);
    },
);

My confusion is that date-time is a format, where it could be argued coercion is a grey area (and no coercion is done on any of the existing formats). Specifically I'm interested in accepting spaces (or any character) per RFC3339 in place of T and Z which could be handled by modifying _is_date_time regex, but I know eventually we're going to need to solve accepting other valid ISO8601 datetimes (possibly defined as a swagger string type using pattern) and coercing to RFC3339.

Thanks for any advice you can give me!

jhthorsen commented 7 years ago

Is spaces valid for a "date-time" format?

More people have asked for coercion now, so I might think about adding a generic hook for invalid data. That means that if the validator finds invalid input, it can call a code block, where "you" can fix the input or indicate that the input is indeed invalid.

leejo commented 7 years ago

Is spaces valid for a "date-time" format?

Strictly no as per the RFC: https://tools.ietf.org/html/rfc3339#section-5.6 # but they state:

Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.

And the may in that paragraph is a lowercase may so they're basically hand waving at this point.

Note that they do say the T and Z chars in the date-time can be lowercase but can be restricted to uppercase , so https://github.com/jhthorsen/json-validator/blob/master/lib/JSON/Validator.pm#L722 might need tweaking if we're going to accept lowercase T and Z chars.

I'm unconvinced on having the validator coerce input data. You should be strict and reject anything that doesn't pass, because once you start coercing input you're on the slippery slope to having your business logic coupled to the validation. However if people want it then i guess it's a legitimate feature, I think if you're going to add a hook for fixing of invalid data it needs to be strongly advised against in the documentation.

ugexe commented 7 years ago

fwiw I plan on defining the alternative acceptable string patterns and creating new types in the api document (ending up with things like anyOf(date-time, date-time-iso8601). Maybe there is a different pattern I should follow to incorporate this into our application?

For the especially simple cases where you want to do a basic transformation it might be nice to express: {"type":"string","format":"date-time","x-format-coerce-pattern":"s/^(.*?)(?:T|\s)(.*?)$/$1T$2/r"}. This seems like a concise way to work around common use of mis-implemented rfc formats and rfc discrepancies while still being explicit about what you will accept (and how the application sees the data after transformation).

jhthorsen commented 7 years ago

You really aren't "explicit about what you will accept" when you request this change. If there's a bug in the "date-time" format, we can fix that, but I'm pretty sure this issue will not get implemented.

Also, what is the difference between "date-time" and the iso8601 format? I've tried to figure it out, but I fail to understand the difference.

ugexe commented 7 years ago

Its explicit in that it declares it is following a specific rfc 'may', which as pointed out by @leejo is questionable. It's also explicit (at least in the s///r example) in that you still have to follow match a pattern - it is just extended to also show how the matches are reformatted via /r

RFC3339 is basically a subset of ISO8601. The differences are outlined here

jhthorsen commented 7 years ago

Here is what I suggest: Someone (other than me) should add a fix for the "date-time" format. I don't want it to be solved with modules such as DateTime. It need to be either a tested regex, or some light-weight module.

When it comes to "where" the date string should be coerced: I suggest where you need it to be the format you need it to be. If you're a DateTime fan, then just create the objects with something like DateTime::Format::ISO8601 or DateTime::Format::DateParse. For me personally, it's often wrong to coerce the string on input, since i might need to change it to both epoch and something that MySQL/SomeOtherDB understands later on.

ugexe commented 7 years ago
# From Mojo::Date
my $RFC3339_RE = qr/
  ^(\d+)-(\d+)-(\d+)\D+(\d+):(\d+):(\d+(?:\.\d+)?)   # Date and time
  (?:Z|([+-])(\d+):(\d+))?$                          # Offset
/xi;

Which gets does what I want... (Accepts non T character for date and time separator)

say Mojo::Date->new("1994-11-06 08:49:37Z")->to_datetime;
# 1994-11-06T08:49:37Z (Yes!)

...maybe some upstream patches are needed too

say Mojo::Date->new("1994-11-06hello world08:49:37Z")->to_datetime;
# 1994-11-06T08:49:37Z (💩!)

I'll write some test, update the regex, and send a PR sometime soon.

jhthorsen commented 7 years ago

@ugexe++ # awesome!

jhthorsen commented 7 years ago

Closing this, since we don't want coercion. We want a validator that confirms with RFC3339.

See #49.