openactive / data-model-validator

The OpenActive data model validator library
MIT License
1 stars 1 forks source link

Schedule validation #365

Open nickevansuk opened 3 years ago

nickevansuk commented 3 years ago

This issue actually likely contains a number of rules, and the various requirements below could be split into different rules in a number of ways. Generally it's better to keep rule files as focussed as possible, with a good separation of concerns between the validation being conducted by each file.

Validate recurrence rule from Schedule

The validator should include a rule that validates that a Schedule contains a valid iCal recurrence rule.

There's already schedule generation logic in the conformance services, so it should be easy to convert this into some validation logic for the same fields, creating an rrule.

Note the logic above does not include scheduleTimezone (which it definitely should), so this will need to be added.

If there are not enough properties to create an rrule (e.g. the rrule library might return an error saying this?), display #292

If the properties do not create a valid rrule, return an error from the rrule library that explains what’s wrong, as a FAILURE

Using the rrule, validate that all exceptDate values are actually part of the recurrence defined by the rrule, and produce a WARNING if any are outside of the rrule.

Also note this other rrule library that has better support for timezones, if the library used above is problematic https://www.npmjs.com/package/@rschedule/core - however it has much lower usage so the rrule library is preferred if possible.

The Bookwhen (http://data.bookwhen.com/) and Open Sessions (https://opensessions.io/openactive) feeds should contain good test data for this.

Validate specific properties within PartialSchedule (target only PartialSchedule)

Validate specific properties within Schedule (target only Schedule)

Validate specific properties within Schedule and PartialSchedule (target both)

Also consider whether extending the below could be an approach to implementing any of the above:

See further documentation:

Lathrisk commented 3 years ago

Task list:

Lathrisk commented 3 years ago

@thill-odi @nickevansuk

I've been looking at the RRule validation today and wanted to ask a couple questions. There is very little information required to generate a valid RRule. I think fundamentally it is just a frequency that is required (iCal spec). The requirement for a Schedule from schema.org.

Do we want to raise any warnings or failures in addition to the these requirements, for fields such as startDate/startTIme, or duration? If my understand is correct these fields could surface in other locations in the data? (e.g. courseInstance example

nickevansuk commented 3 years ago

@Lathrisk so we need enough data to generate at least one occurrence when ignoring exceptDate - my understanding is that many properties are only valid in certain value combinations - so perhaps one could try to simply generate the first occurance in the schedule and see if it's possible? So assume in this case yes it will need startDate and startTime

If the schedule does not generate any occurrences it should be a FAILURE

All data used to generate a Schedule must come from within the Schedule only; it is entirely self-contained.

nickevansuk commented 3 years ago

Also, to help with:

Using the rrule, validate that all exceptDate values are actually part of the recurrence defined by the rrule, and produce a WARNING if any are outside of the rrule.

Potential logic as follows for the exceptDate check:

Here's the schedule generation code that imin uses, complete with a working timezone implementation:

https://gist.github.com/lukehesluke/b00a20caf22abf26b38c0c5e3238e150

Note specifically that process.env.TZ must be set when the validator starts in order for RRule to behave as expected in all environments

You won't need all of the code above, but it should give you a good head start here :)

Lathrisk commented 3 years ago

Thanks, @nickevansuk. I had started using rrule with luxon so will take a look at this as a more predictable alternative.

Lathrisk commented 3 years ago

@nickevansuk (cc @thill-odi) I've been working through the timezone code, but I have been trying to understand the reasoning for implementing this as part of this rule. Surely the primary requirement is ensuring that the time, date, and timezone are provided, rather than the specific implementation of the RRule generation that we apply here? The generated recurrence rule is a byproduct of the recommendations in the developer docs, rather than concrete values in the OpenActive data that we need to validate?

Is there something important that I am missing?

nickevansuk commented 3 years ago

Not sure I'm following @Lathrisk - the only reason for the timezone's need to be accurate is for the exceptDate validation (as per https://github.com/openactive/data-model-validator/issues/365#issuecomment-806237716).

So perhaps I've misunderstood? Also should have mentioned previously that if using luxon is something that gets around needing to set process.env.TZ that sounds great (as setting that globally isn't ideal for a library) (unless this was something @lukehesluke already tried when working on the Gist above?)

Lathrisk commented 3 years ago

Thanks @nickevansuk, that does make sense wrt to the exceptDate's. I think what hasn't been clear to me is the format of the exceptDate field. Or I had made an assumption about it'd format.

Is this always in UTC? I notice that the examples provide the closing Z but I couldn't find that in the documentation. I had previously assumed that would be listed in the same timezone as the scheduledTimezone field.

If I understand correctly then we are generating the RRule with the 'correct' UTC dates so that we can compare this to the exceptDates which are in UTC?

lukehesluke commented 3 years ago

unless this was something @lukehesluke already tried when working on the Gist above?

I attempted to use the tzid/Luxon solution specified in their docs here 👉 https://github.com/jakubroztocil/rrule#timezone-support but it's not machine-independent. In their example, they show that the tzid is offset against the system timezone (which causes issues when running across machines which do not all share the same system timezone). It additionally has the same issue of not producing valid dates when considering the timezone designator. They are only valid if you ignore the timezone designator (which will be UTC if following rrules guidelines) and assume that the system's timezone is being used instead.

For example, it failed the "British Summer Time test", presumably because my local machine was already set to Europe/London

> var myRule = new RRule({
  freq: RRule.WEEKLY,
  interval: 1,
  byweekday: [RRule.MO, RRule.FR],
  dtstart: new Date(Date.UTC(2021, 2, 18, 7, 30)),
  until: new Date(Date.UTC(2021, 3, 6)),
  tzid: 'Europe/London'
});
> myRule.all();
[ 2021-03-19T07:30:00.000Z,
  2021-03-22T07:30:00.000Z,
  2021-03-26T07:30:00.000Z,
  // this should be 2021-03-29T06:30:00.000Z (i.e. the UTC version of 2021-03-29T07:30:00.000+01:00)
  // as BST occurs on the 28th March. Same for the subsequent ones:
  2021-03-29T07:30:00.000Z,
  2021-04-02T07:30:00.000Z,
  2021-04-05T07:30:00.000Z ]

as setting that globally isn't ideal for a library

Agreed! You might be able to use the tzid/luxon solution without requiring that process.env.TZ is a special value by getting the resulting dates and then transforming them using whatever the current value for process.env.TZ is.

e.g. something like (NOTE: I haven't actually tested this):

return datesWithWeirdTimezones.map(date =>
  DateTime.fromJSDate(date)
    .toUTC()
    .setZone(process.env.TZ, { keepLocalTime: true }));

This could work if TZ is guarantueed to always have a value

EDIT: I can confirm that process.env.TZ is not guarantueed to always have a value - maybe there's another way of getting the system IANA timezone?