openreferral / specification

The Human Services Data Specification - a data exchange format developed by the Open Referral Initiative
https://openreferral.org
Other
117 stars 49 forks source link

datapackage.json - schedule - timezone might be better as labels #265

Open odscjames opened 2 years ago

odscjames commented 2 years ago

(maybe related to https://github.com/openreferral/specification/issues/264 )

Timezone may be better as a label (eg "Europe/London") than a offset (eg "+01:00").

1)

This is because if a location changes it's offset or DST rules then your data can end up being incorrect. https://codeblog.jonskeet.uk/2019/03/27/storing-utc-is-not-a-silver-bullet/

2)

If an event happens at the same local time all year round but that location has some kind of DST, then you need to be able to reoccur an event across the DST shift. If you only have the OTC offset, and don't know the actual time-zone you can be in trouble. https://derickrethans.nl/storing-date-time-in-database.html

odscjames commented 2 years ago

Ok I've had a deeper look and I think this is important.

Until & valid_to

I'm not clear why we have both set of fields. The schedule page says:

Consuming systems may infer from this that as the valid_to date approaches, they should seek to find out what future opening hours are for the service; they shouldn’t necessarily infer that a service has ended - just that they don’t have any more opening time information. This is in contrast to defining the service’s opening hours by use of the until iCal field, which explicitly defines the end of a recurring event, and therefore a consuming system may infer that the service is no longer available after that time.

This seems to be based on a language reading of "until" and also the assumption that a "until" value applies to the whole service and not that specific schedule - remember a service can have many different schedules. This seems to me to be putting much more meaning on a single schedule row than it can bear. Should consuming systems assume that a schedule with any single "until" field is no longer available? This text implies that.

We need to draw a clear distinction between the states of:

  1. service is no longer available
  2. service is generally available but this data set does not have details on the exact schedule at the moment
  3. service is generally available but not right now - it's temporarily closed for holiday, or venue refurbishment, or something

I would suggest we can do this by having clear data in different places:

  1. The service object has a status field (text format) - use that, or something else on the service object.
  2. Schedule rows do not exist that cover the current moment
  3. This one is less clear. Clearly schedule rows should not exist that cover the temporary closure, but maybe the status field on a service could also be used?

This just clearly moves the responsibility for showing permanent closure to the service table, and has responsibility for other states (unknown or temporarily closure) in the schedule table.

[ Note: https://developers.openreferraluk.org/Guidance/ avoids this by not having "until". ]

valid_from && dtstart

I'm not sure why we have dtstart when any calculations can also be done from the valid_from date? Is there any case when these fields would have different values?

A worked example: New York service for 6 months over a DST

A service runs in New York every Friday from 10am to 4pm in the start of 2021.

schedule row: id: 1234 service_id: 46 valid_from: 2021-02-05 valid_to: 2021-07-30 timezone: ??????????????? freq: WEEKLY byday: FR opens_at: ??????? closes_at: ??????????

First problem: What should timezone be?

New York time is UTC -5:00 OR -4:00 depending on where we are in New York's DST shift. This 6 months cross the DST (clocks go forward March 14 2021).

Second Problem: What should opens_at / closes_at be?

The description makes clear it should not be in the local timezone but UTC or a timezone with offset.

If I want to express them in UTC time (Z time) then: I don't know whether to use -4 or -5 to calculate the UTC time.

If I want to express them as an offset ("10:00-05:00" for start) then:

In both cases, someone who did not know this dataset was NY time and who wanted to expand this schedule to get a list of all occurrences would run into trouble.

Lets say I have made a guess and have listed:

id: 1234 service_id: 46 valid_from: 2021-02-05 valid_to: 2021-07-30 timezone: -4 freq: WEEKLY byday: FR opens_at: 10:00-04:00 closes_at: 16:00-04:00

They would probably try and do this with UTC times, and get a list like:

Translating those UTC times to New York local times gives you:

(I think I have the direction right but the point is:) The local time of the event suddenly shifts by an hour which is not intended!

They could ignore the time zone information and expand as local time - but then, if converting those local times to UTC time as some calendar systems will, some of the UTC times will be incorrect.

Now, the point has been raised that data sets tend to be local and so someone "did not know this dataset was NY time" is probably not going to be true. But if so:

(Note this example just covers a service that operates over a DST shift and that was enough to show problems - point 2 in my previous message. I haven't worked up an example to cover point 1 yet, but these recommendations will also cover that.)

I would like to consider:

[ I would like to recheck these and work up some examples, but hopefully this is enough to start a conversation! ]