gs1 / WebVoc

GS1 Web vocabulary development site
Apache License 2.0
29 stars 6 forks source link

duplicate words/phrases in enumeration values #42

Open VladimirAlexiev opened 1 year ago

VladimirAlexiev commented 1 year ago

(Noticed in #32) "Pecan Nut and Pecan Nut"@en

All enum values should be checked for duplications.

PS: I'm curious to learn the data munging process that lead to this ;-)

mgh128 commented 1 year ago

If I remember correctly, there was a work request to update https://gs1.org/voc/AllergenTypeCode to align it with what was at the time the current snapshot of the corresponding code list in the GS1 Global Data Dictionary. At http://apps.gs1.org/GDD/Lists/Code%20List/CLDispForm.aspx?ID=27109 you can see the entry - and if you're already laughing at the decidedly uncool URL generated by some content management system, keep in mind that the GDD is still using URNs (!) for its semantic resources such as urn:gs1:gdd:cl:AllergenTypeCode:SP

We don't currently have a regular process to ensure that updates to the source code lists in GDD or GDSN data model are reflected in the GS1 Web vocabulary to keep those synchronised. In this case, it appears that the description should have been "Pecan Nut and Pecan Nut Products" - but maybe this was already truncated in the spreadsheet I received.

Ideally, the tools for managing all updates to the GDD would automatically result in generation of updates for the GS1 Web vocabulary and trigger a work request each time the GDD is updated. We're aware that we need to have that overview of which source code lists correspond to code lists in the GS1 Web vocabulary - and to have a better way to automate the synchronisation. In some cases, the source lists are actually in the GPC dataset, typically within the lists of attribute values - and it's also possible that there is imperfect synchronisation between related codelists in GPC and the GDD.