usdot-jpo-ode / wzdx

The Work Zone Data Exchange (WZDx) Specification aims to make harmonized work zone data provided by infrastructure owners and operators (IOOs) available for third party use, making travel on public roads safer and more efficient through ubiquitous access to data on work zone activity.
Creative Commons Zero v1.0 Universal
89 stars 62 forks source link

Suggestion - WZDX Selected Units of Measurement - allow both English and Metric for interoperability #231

Closed devorah-ql closed 2 years ago

devorah-ql commented 2 years ago

Summary WZDX currently appears to be built upon metric units - kph, km, etc. However, in order for WZDX to be an international, interoperable standard, wouldn't it make more sense to simply allow data providers to specify their own units? This would allow either English or metric units to be used, based on the location and application in question.

Motivation Restricting WZDX to metric units seems limiting. In the U.S., where WZDX is first being standardized, English units of measurement such as mph are standard, and so many agencies that may wish to use the data from the feed will be operating in English units. For example, I imagine most DOTs would like to receive and share their data in miles and mph, not in kilometers per hour, which are not units they use. That is only a minor conversion if it is a single value, but smart work zone systems and connected devices and vehicles may end up processing millions of lines of data just for a single device. So, it seems cumbersome and strange that a provider would need to convert millions upon millions of lines of data from mph to kph, only to have the receiving agency need to convert it back once again. And this also creates rounding errors switching back and forth. This is just one example, but I can imagine many applications where the main producers and receivers of this data are both operating in English units, so it seems sensible to remain with them.

It seems that it would give the feed much greater flexibility and international interoperability, simply to have a field where units are specified.

Proposed Changes Recommend that the specification allows either English or metric units of measurement. And recommend adding a field in the feed where this is specified. This will allow users to use the appropriate measurements for the area and application they are operating in.

sergebeaudry commented 2 years ago

No objection. this will allow each countries to work in their native format and will allow data exchanges across the 2 units. For us as a data producer our road side equipment such as radar are defacto in KPH and we are used for conversion but I can understand some are not and do not want to manage such process.

j-d-b commented 2 years ago

FYI, this discussion has already happened, see #169 and #174.

Following presentation to and input from subgroup members, the team decided that WZDx choosing a single unit is the cleanest approach. Converting a between speed units for presentation purposes is not an issue. Machines do not care what unit they work with.

We will bring this issue up in one of the next two subgroup meetings for another round of discussion, but it is unlikely this will be changed. It is simpler to have the specification define the standard unit for each property. This is done by the TMDD specification and by data producers such as Ver-Mac and GM. They use KPH internally then convert to the desired unit of the end user.

devorah-ql commented 2 years ago

@j-d-b Thanks for the reply. Yes, I saw issue #169. However, that discussion was with regard to the value of a variable speed limit, which is a single value, not for the units of the entire feed, which I would consider a very different scope, as it affects all future data streams related to the feed, which will be numerous.

Respectfully, I can't agree that choosing a single unit, which is not the unit of the country WZDX is initiating from, is cleanest nor simplest. The only thing it would eliminate is a simple one-line entry on the feed, where a user would specify their units, based on the local standard. But what it creates is massive amounts of unnecessary data conversion and storage on the other end. For example, on the smart work zone data feed, a DOT within the US will require data from smart work zone providers, the data for which, they will want in English units, since KPH is not applicable for their operations. So the smart work zone provider will need to convert all data into metric, and feed it to the DOT, which will then have to convert it back again. And when there are questions about any data provided in the feed, it will have to be analyzed in the kph it was transmitted in, while communicated in mph, since that is what everyone is actually using. This seems inherently messy and complex to me, not clean.

Again, this is not significant when one is thinking on the scale of a single measurement. But we are talking about big data here, where one set of devices will generate millions and millions of lines of data.

Understood that Ver-Mac and GM may use kph internally and convert to other units. However, the idea is for this to be an international standard that can be used by all different agencies and data providers. So it seems to me that more flexibility will provide more utility here. My company provides many data feeds to DOTs within the US and they always require the data in English units, since that is what they operate on. Why not allow for this?

j-d-b commented 2 years ago

@devorah-ql there are several downsides and non-trivial added complexity that come from adding the ability to specify units:

  1. It is not straightforward where to specify the units;
  2. We may have to add and maintain a variety of enumerated types to enumerate the options for units for all options that anyone wants;
  3. Processing software has to check the units property and have conditional logic for handling each type of unit.

For a few examples of each, using the reduced_speed_limit concept as an example:

1. Where and how to specify units

One way is to add a property for specifying the unit next to each property that needs a unit. For example, with reduced_speed_limit we could have added reduced_speed_limit_units which is restricted to a new "SpeedUnits" enumerated type with some set of options including KPH and MPH (and more?). However, then we'd need a new property for every property that used speed units.

Another option, as you suggest, is to have the unit for speed specified at the feed-level, let's call it speed_unit for this example.

A first negative is that it isn't clear what properties the speed unit applies to. The specification documentation would have to specify that, which is hard to maintain and requires processing code to have these "rules" coded in it somehow so it can know to check the feed info for the unit when parsing certain properties elsewhere on objects in the feed. I think that is unclear coupling and poor design.

Also, if we're talking about humans parsing a feed, which is the only case I see your argument relevant because machines don't care what unit they are working in, you would see the value in the reduced_speed_limit property and then have to check a different object that isn't nearby (visually) to know what unit it is. It is also easier to forget the unit than if the property name says the unit, like it does with reduced_speed_limit_kph in WZDx v4.0.

There are variations on these options, but it's hard to get around the negatives.

2. Many enumerated types with boundless options

I'll use the "SpeedUnits" enumerated type as an example again: it would have to contain many options depending on what consumers/producers use (in addition to KPH and MPH) to avoid anyone having to convert the unit.

3. Processing software has to check the units property and have conditional logic for handling each type of unit.

Parsing reduced_speed_limit_kph from a WZDx feed is trivial because you know the unit from the property name/description and thus do not need any more information to know what to do with it. If the unit was specified anywhere else and there were multiple options, processing software would have to check multiple properties and have conditional logic that parses it differently depending on the unit. Again, this is more added complexity.

devorah-ql commented 2 years ago

@j-d-b. Thank you for explaining the complexity and the trade-offs here. That is helpful to understand.

devorah-ql commented 2 years ago

@j-d-b. Thanks for explaining the potential issues. It is helpful to understand the details and understand the concerns about complexity.

I discussed this along with a few of my developers, and here are some thoughts:

  1. It is not straightforward where to specify the units;

The developers think this should be pretty straightforward. It would be standard that this type of information would get specified in the root node, and based on that would apply to all the children. For example, it could be specified in the basic info of the WZDXfeed or the RoadRestrictionFeed, or the SWZDeviceFeed. Basic info already has to be provided there, so units could be included as well. And then based on this, those units would apply to all children. The developers said this type of setup is common with JSON feeds, and not complicated to process.

And agreed, the specification document would have to detail what those units are. But it will have to do that anyways when the system is running on metric. This would just involve adding English units as well, which should not be too onerous. As far as parsing, the system will have to do the same things whether metric, English etc. It will still have to know that speed values are kph, while distance values are kilometers, etc. So either way, the units of the field have to be documented, and then taken in by the processor.

  1. Many enumerated types with boundless options I'll use the "SpeedUnits" enumerated type as an example again: it would have to contain many options depending on what consumers/producers use (in addition to KPH and MPH) to avoid anyone having to convert the unit.

I can understand the hesitance to open up to boundless unit variation, and agree that there doesn't seem to be a compelling reason to do so. It seems it would be quite reasonable to restrict the feed to Metric and English units, which covers much of the world. I don't think that all minor edge cases need to be covered. But that doesn't stop us from covering the major units of the U.S., which is where the feed is being developed, alongside metric, which also covers much of the world.

  1. Processing software has to check the units property and have conditional logic for handling each type of unit.

In many places, the end-users will be using English units. So this means that the processing software is already going to have to understand units and do conversions. So, I can't say that seems any simpler than checking units and processing based on the ones specified. I double-checked this with my developers and they confirmed that checking units and processing based on them is simple programmatically speaking. Those who have the skill to process a JSON feed would presumably be able to process units as well.

In terms of the desire to specify units in general, agreed that machines don't care what unit they are working in. But the data will be used by humans, who do indeed care what units they are functioning in. English unit users want to be told their data in English and metric users want it in metric. Also, ultimately, the data from the feed needs to be analyzed by humans, because of the Garbage In/Garbage Out issue. If everyone is feeding data into the system, questions will have to come up - is this good data? What is the meaning of the data? Etc. So, humans will always end up coming into the analysis at some point to ensure accuracy.

In the end data can be converted. But IMO, it just seems more straightforward for English users to be able to send their data back and forth in the same format, instead of having to convert it back and forth and back again. Then conversion can just be done by those who need it.

dxpack commented 2 years ago

Why has speed units been standardized on KPH, but roadway distance markers are in miles (WorkZoneRoadEvent, beginning_milepost and ending_milepost)?

I don't see any issue with either fixing speed units to KPH or providing a root-level localization property, in terms of computational complexity - neither are meaningfully complex and both are not nothing (convert a million KPH values to MPH, or look up a million localization properties - it's six of one, half a dozen of the other). But having a specification that mixes localization classifications is confusing.

To complicate matters further, some countries (just one?) appear to use a mixture of measurement units. I believe the UK is mostly metric, but roadway distance and speed are still in miles. I don't know if that is exclusively colloquial or if it is official (which units do UK traffic engineers work in?).

Aside, the non-metric classification is typically called "Imperial", not "English".

j-d-b commented 2 years ago

@dxpack yes, the milepost property naming is an issue we have discussed though it looks like it never made it into an Issue. Are you able to make an issue for making milepost naming either not specific to a unit, such as "highway location marker"? I don't think requiring a unit for this makes sense, because mileposts aren't exact measured distances, they are distance-based reference points.

As for the argument for allowing a producer to specify a unit, I am personally against it for the reasons above, primarily because I don't see any benefit in it and it adds complexity (at least another property and enumerated type). We can discuss it in the next WZDx Spec Update Subgroup meeting with all the members to come to a conclusion.

dxpack commented 2 years ago

Issue: #243 beginning_milepost / ending_milepost (WorkZoneRoadEvent / DetourRoadEvent): not internationally descriptive

j-d-b commented 2 years ago

Closing due to low interest. The proposed ability to specify a unit is a significant breaking change (we'd have to rename properties) with no functional value.