usdot-jpo-ode / wzdx

The Work Zone Data Exchange (WZDx) Specification aims to make harmonized work zone data provided by infrastructure owners and operators (IOOs) available for third party use, making travel on public roads safer and more efficient through ubiquitous access to data on work zone activity.
Creative Commons Zero v1.0 Universal
89 stars 62 forks source link

Require UUIDs for feature IDs #207

Open j-d-b opened 2 years ago

j-d-b commented 2 years ago

A Universally Unique IDentifer (UUID), also known as GUID, defined in RFC 4122, is a 128-bit ID that can guarantee uniqueness across space and time.

Here's an example of a UUID: 6ba7b810-9dad-11d1-80b4-00c04fd430c8

Currently, WZDx RoadEventFeature id is just a string. It is likely that there will be collisions between road event IDs from different producers which makes it difficult to aggregate feeds and to relate objects which reference a road event ID (such as within a Relationship, e.g. connecting detours to work zones) to the road event they refer to. Within a single data source there is no issue, but across feeds from multiple producers the ID of the road event alone is not ample to reference a road event. It would be way easier if it was, and a UUID would enable that.

From my experience working with MassDOT and Ver-Mac, this change would not be burdensome as the content of a RoadEventFeature ID does not have meaning and by its description in the WZDx specification is not intended to. Thus the requirement of a UUID would just require a change in how IDs are generated. There are many libraries for generating UUIDs, so this is trivial.

Currently, the only feature in WZDx with an identifier is the RoadEventFeature. However, active PRs seek to add a FieldDeviceFeature (see #195, #208), and there has been discussion about a RoadFeature. I think all feature IDs should be UUIDs which would greatly facilitate referencing features without the context of the original feed or data source they came from, historical reporting and data warehousing, and relating features.

DeraldDudley commented 2 years ago

Brilliant!

j-d-b commented 2 years ago

For the v4.1 release, this can be implemented as a recommendation through the description of the identifier properties and shown in all examples. Specifically, the following properties would recommended using a UUID:

Property Object
id RoadEventFeature
id FieldDeviceFeature
data_source_id FeedDataSource

The use of a UUID could eventually be required via a future major WZDx release. Having it recommended would be a good stepping stone towards this goal.

j-d-b commented 2 years ago

Based on discussion in the 2022-02-23 spec update subgroup meeting, members are on board with the transition to UUID, however, I gathered it would be helped to add a name property to the RoadEventCoreDetails for providing a human-readable "name" for the road event, which is what some current WZDx users were using the ID for. The FieldDeviceCoreDetails already has the name property, defined as:

A human-readable name for the field device.

With the addition of name there is no lost functionality when ID is required to be a UUID.

j-d-b commented 2 years ago

UUIDs are recommended in v4.1 from #278.

This issue will stay open for requiring UUIDs in v5.0.

jacob6838 commented 1 year ago

At CDOT, we are currently generating WZDx messages by translating a different message to WZDx. That higher level message only has a string identifier, which is unique to all other CDOT messages (of the form OpenTMS-Event2702170538, OpenTMS-Event2843552682, ...). I am now working on updating this translator to 4.1, and need to determine how to handle the identifier, preferably as a UUID. I believe that a goal should be to keep the same identifier for each consecutive message update, meaning that this UUID should persist across updates. Currently, the only simple way to do that (for us) would be to use seeded UUIDs, seeded from that CDOT identifier (OpenTMS-Event2702170538). this solves our problem, but may not be a good long term strategy, as it removes the "guarantee" of uniqueness of the UUID if it is being seeded. I am fairly confident that these identifiers will be globally unique, at least in the short term. Is this a valid solution, seeding UUIDs with our own unique IDs? We have some other options, but this is the simplest by far.

AdamICone commented 1 year ago

@jacob6838, I think there are two key points you bring up:

  1. Yes, id's (UUID or otherwise) should be maintained for the same feature (both "RoadEvent" and "Device"), for as long as it's the same functional feature. Which attributes define a functional feature will vary based on specifics, but if your internal system would continue to use the same id (i.e. the feature defining attributes haven't changed), that feature in WZDx should also keep using the same id. (Note: the internal system id doesn't have to be the same as the WZDx id)
  2. There are 4 practical versions of UUIDs (ignoring version-2), and pseudo-random number generated is only one of them. As long as the UUID is generated correctly, there's no functional difference between the different versions - a UUID is a UUID.

Specifically for this case, I would use a Version 5 UUID - see RFC 4122, section 4.3 for some more details about name seeded UUIDs (https://www.rfc-editor.org/rfc/rfc4122#section-4.3). In short, the process is to convert a SHA1 hash of a string to a UUID: genUUID_v5(SHA1(namespaceUUID + name)) – there are a number of online generators, e.g. https://www.uuidtools.com/v5. So, I would start by generating a UUID, and saving it as the organization/namespace UUID (really doesn’t matter how this is generated, just store it to always use the same one). Then, you can append your internally unique name (i.e. "OpenTMS-Event2702170538") to generate the globally unique UUID for the feature – which can always be re-generated by you using the same organization/namespace UUID and the same internally unique name.

Also see the following stackoverflow question – the top answer includes some good pseudo-code for an implementation: https://stackoverflow.com/questions/10867405/generating-v5-uuid-what-is-name-and-namespace

jacob6838 commented 1 year ago

@AdamICone That makes perfect sense, UUID version 5 supports this use case perfectly. Thanks!