opendatamesh-initiative / odm-specification-dpdescriptor

The Data Product Descriptor Specification (DPDS) Repository
https://dpds.opendatamesh.org/
Apache License 2.0
64 stars 3 forks source link

The `id` property description is inconsistent #66

Open andrea-gioia opened 1 month ago

andrea-gioia commented 1 month ago

Problem

The specification declares to use as the value for the id property a UUIDv3 built hashing the fullyQualifiedName with SHA-1 algorithm.

According to RFC-4122 the difference between UUIDv3 and UUIDv5 lies in the hashing algorithm used:

This is inconsistent. If we decide to use a UUIDv3 value for the id property then we must declare the use of MD5 as the hashing algorithm. If instead, If we decide to use SHA-1 hashing algorithm then we must declare that the id property is an UUIDv5

Moreover isn't clear how to generate the UUID using only the fullyQualifiedName. UUID generation requires the name of the object and its namespace. Both information can be derived by the fullyQualifiedName but it is not clearly explained how.

Solution:

The RFC-4122 suggests using SHA-1 algorithm as the preferred hashing function whenever possible. So as a solution for the inconsistency described here, we propose to adopt UUIDv5 in place of UUIDv3. We need also to specify how to decompose the fullyQualifiedName into name and namespace to generate the UUIDv5 value. We propose to use as namespace the mesh-namespace part of the fullyQualifiedName and as name the product name plus its major version number, separated by columns.

Other considerations:

  1. Could be useful to add a namespace property to the descriptor root entity
  2. In general, we always talk about data products but in reality, a descriptor document describes a specific version of a data product, not the product in general. We need to clarify this important concept to avoid ambiguity in the future.
andrea-gioia commented 3 weeks ago

In version 1.0.0 the description of every id property has been changed referring to UUID v5 in place of UUID v3. The issue is still open anyway because the ambiguity on the namespace remains.