openmobilityfoundation / mobility-data-specification

A data standard to enable right-of-way regulation and two-way communication between mobility companies and local governments.
https://www.openmobilityfoundation.org/about-mds/
Other
676 stars 232 forks source link

Reports - simplify start_date column #809

Closed wellorder closed 1 year ago

wellorder commented 1 year ago

Explain pull request

The current handling of time in /reports is strange for two reasons:

This PR handles a few of the action items from that discussion.

Is this a breaking change

This is a breaking change. The intent is for this to be included in MDS 2.0.

Impacted Spec

Which spec(s) will this pull request impact?

Additional context

See the discussion from https://github.com/openmobilityfoundation/mobility-data-specification/wiki/Web-conference-notes,-2022.07.07-(MDS-Working-Group) and the discussion at https://github.com/openmobilityfoundation/mobility-data-specification/issues/672

schnuerle commented 1 year ago

What do you think about #804 for adding a provider_id to reports?

wellorder commented 1 year ago

In response to https://github.com/openmobilityfoundation/mobility-data-specification/pull/804#issuecomment-1343022965:

I'm fine putting the the date string, like 2022-11-01, but putting a time with offset is fundamentally misguided (there are months with more than one UTC offset). Another possibility would be to include the time zone as its own column, rather than the offset. (That was actually in the initial commit for [this] PR :) ) Then you have two columns with identical values in each report, and you're reporting to the city their own time zone (which, uh, they know), but I'm not against it. (One could consider it evidence that the provider has followed the spec with regard to counting in the local time zone I suppose.) I am against the UTC offset.

schnuerle commented 1 year ago

I think the reason for having a time zone is to be clear about it for jurisdictions that have more than one time zone. A rarity now perhaps, but that was the reasoning.

Having 2022-11-01 as a date field is fine, and leaving the time zone out is fine too.

An open question for me is about keeping the duration in there. I would keep it, as it's a small field data-wise and we could explicitly make it flexible, with just a recommendation of month as a default.

wellorder commented 1 year ago

Jurisdictions with more than one time zone...sends a shiver down my spine to think about. I know they exist, but they're enough to make a data engineer exclusively use UTC :) I'll note that the existing spec does not address that at all, so I don't think that was the initial intent. (What's the offset in place for the majority of the month in a market where two offsets were in place the same amount of time?)

Just to be sure I understand the current proposal:

schnuerle commented 1 year ago
  • a date column with format YYYY-MM-DD
  • a duration column with P1M as its value
  • no time_zone column I'm fine with this, and I can update the PR if that sounds correct.

I think this looks like the way forward to me too. Thank you!

You may also have to resolve the conflicts with provider_id from the previous PR merge, @wellorder.

wellorder commented 1 year ago

Okay, this is updated. It's a pretty small change after all the discussion, but one that will save a lot of confusion I think. 😄