gs1 / WebVoc

GS1 Web vocabulary development site
Apache License 2.0
30 stars 6 forks source link

GS1 classification problems (GCP and additional) #36

Open VladimirAlexiev opened 2 years ago

VladimirAlexiev commented 2 years ago

(in reply to https://github.com/gs1/WebVoc/issues/35#issuecomment-982656653 by @mgh128)

Props gpcCategoryCode, gpcCategoryDescription have these shortcomings:

AdditionalProductClassificationDetails is better because it allows you to use someone else's classification URLs without duplicating info locally. In fact I'm tempted to use it for GPC ;-) But:

VladimirAlexiev commented 2 years ago

@philarcher Here's what I came up with to capture GPC attribute/value pairs

<https://id.gs1.org/01/09520876543219/10/ABC123> a gs1:ProductBatch;
  schema:model <https://id.gs1.org/01/09520876543219>;
  gs1:gtin "09520876543219";
  gs1:hasBatchLotNumber "ABC123";
  gs1:productionDateTime "2021-04-04T22:30:00"^^xsd:dateTime;
  gs1:bestBeforeDate "2021-10-04"^^xsd:date;
  gs1:productName "Red Round Tomatoes";
  gs1:colourDescription "RED";
  gs1:countryOfOrigin <https://example.org/resource/iso3166/CN>;
  gs1:gpcCategoryCode "10006165";
  gs1:gpcCategoryDescription "Tomatoes Round";
  schema:additionalProperty
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20000743"; gs1:gpcCategoryDescription "Country/Zone of Origin"];
     schema:value      [gs1:gpcCategoryDescription "30014651"; gs1:gpcCategoryDescription "CHINA"]],
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20002772"; gs1:gpcCategoryDescription "Colour of Tomatoes"];
     schema:value      [gs1:gpcCategoryDescription "30001983"; gs1:gpcCategoryDescription "RED"]],
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20002739"; gs1:gpcCategoryDescription "Growing Method"];
     schema:value      [gs1:gpcCategoryDescription "30014683"; gs1:gpcCategoryDescription "ORGANIC"]],
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20002737"; gs1:gpcCategoryDescription "Quality (UNECE Standard)"];
     schema:value      [gs1:gpcCategoryDescription "30014608"; gs1:gpcCategoryDescription "CLASS I"]].

COMMENTS PLEASE!

oldskeptic commented 2 years ago

@VladimirAlexiev What are your thoughts on recording the Tomato variety? PLU's are variety based, but GTINs not necessarily so.

VladimirAlexiev commented 2 years ago

@oldskeptic No clue... Would https://www.gs1.org/voc/consumerProductVariant fit?

mgh128 commented 2 years ago

@VladimirAlexiev @oldskeptic I'm not sure that https://www.gs1.org/voc/consumerProductVariant is really intended to be used to express different varieties of tomatoes, citrus fruit - it might be better if we have a dedicated property within https://www.gs1.org/voc/FruitsVegetables for this purpose.

mgh128 commented 2 years ago

@philarcher Here's what I came up with to capture GPC attribute/value pairs

  • uses gs1:gpcCategoryDescription, gs1:gpcCategoryDescription not just for GPC classes, but also for attributes

    • abuses the domain of these props, which is Product
  • mixes GS1 and Schema in an unholy way

    • schema:propertyID is surely not meant to be described with attributes itself. @danbri @RichardWallis is this sacrilegious?
  • notice the schema:model link. There's gs1:gtin but that's only a string.

    • admittedly, "model of a tomato" sounds silly
<https://id.gs1.org/01/09520876543219/10/ABC123> a gs1:ProductBatch;
  schema:model <https://id.gs1.org/01/09520876543219>;
  gs1:gtin "09520876543219";
  gs1:hasBatchLotNumber "ABC123";
  gs1:productionDateTime "2021-04-04T22:30:00"^^xsd:dateTime;
  gs1:bestBeforeDate "2021-10-04"^^xsd:date;
  gs1:productName "Red Round Tomatoes";
  gs1:colourDescription "RED";
  gs1:countryOfOrigin <https://example.org/resource/iso3166/CN>;
  gs1:gpcCategoryCode "10006165";
  gs1:gpcCategoryDescription "Tomatoes Round";
  schema:additionalProperty
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20000743"; gs1:gpcCategoryDescription "Country/Zone of Origin"];
     schema:value      [gs1:gpcCategoryDescription "30014651"; gs1:gpcCategoryDescription "CHINA"]],
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20002772"; gs1:gpcCategoryDescription "Colour of Tomatoes"];
     schema:value      [gs1:gpcCategoryDescription "30001983"; gs1:gpcCategoryDescription "RED"]],
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20002739"; gs1:gpcCategoryDescription "Growing Method"];
     schema:value      [gs1:gpcCategoryDescription "30014683"; gs1:gpcCategoryDescription "ORGANIC"]],
    [a schema:PropertyValue;
     schema:propertyID [gs1:gpcCategoryDescription "20002737"; gs1:gpcCategoryDescription "Quality (UNECE Standard)"];
     schema:value      [gs1:gpcCategoryDescription "30014608"; gs1:gpcCategoryDescription "CLASS I"]].

COMMENTS PLEASE!

Hi @VladimirAlexiev

Please don't be offended but I don't think we would recommend this approach of misusing https://www.gs1.org/voc/gpcCategoryCode and https://www.gs1.org/voc/gpcCategoryDescription in this way. When the definition of https://www.gs1.org/voc/gpcCategoryCode says "8-digit code (GPC Brick Value) specifying a product category according to the GS1 Global Product Classification (GPC) standard. For more information see https://www.gs1.org/gpc" it is intending to restrict you to only using it to specify the 8-digit codes that are GPC Brick Values (mostly in the range 1xxxxxxx) - not the 8-digit codes that are GPC Attributes (generally 2xxxxxx) or the 8-digit codes that are GPC Attribute Values (generally 3xxxxxxx). I realise that this is potentially confusing to newcomers of GPC that all of these are 8-digit codes and we should probably improve the definition for https://www.gs1.org/voc/gpcCategoryCode to make clear that it is not intended to express a GPC Attribute code or a GPC Attribute Value code.

The new GPC browser at https://gpc-browser.gs1.org/ appears to be running quite slowly and (1) doesn't support a search for '30014651' resulting in a result of China, nor a search on '20000743' to find an attribute named 'Country/Zone of Origin' - and a search for 'China' returns a number of spurious results for Echinacea and China juteplants etc. - but not the attribute value 30014651 for China (the People's Republic Of) - nor does it appear to provide direct URLs to lookup each cryptic 8-digit code, whereas an earlier prototype I developed a few years ago ( https://mh1.eu/gpctest/10000028 ) did provide such lookups. I'll try to provide some constructive feedback to whichever team developed the new GPC browser. I wasn't involved in its development - though I did provide the link to my prototype at least 2 years ago, probably earlier.

When the GS1 Web vocabulary was developed, we took inspiration from a blend of some properties defined within the GDSN data model, combined with some attributes (and code list values) defined within GPC.

I accept that there are some gaps in the GS1 Web vocabulary but I'm struggling to understand why your example is not making use of some properties that are available. For example, instead of using GPC attribute 20000743 why not just use https://www.gs1.org/voc/countryOfOrigin ?

Why use GPC attribute 20002772 in addition to using https://www.gs1.org/voc/colourDescription ? Instead of using GPC attribute 20002739 why not use https://www.gs1.org/voc/growingMethod ?

You have already suggested some potential improvements to the keyword search feature but your example looks as though you didn't make as much use of it as you could have done.

VladimirAlexiev commented 2 years ago

@mgh128 Thanks for the extra field, I've added gs1:growingMethod gs1:GrowingMethodCode-ORGANIC. But this doesn't answer how to capture GPC attributes. Could you please make a suggestion as a Turtle example?

Why would one need this:


Thanks for the links and info about GPC browsers! While you contact the GPC team, could you please plead with them to have per-entity URLs and later also Linked Data? Your browser has per-entity URLs, but only a bit of LD (relations are missing):


Then GS1 has further data models with more tons of descriptive props:

Then there's an "EDI Semantics" WG...

Seems there's a need of merging various GS1 modeling/product description efforts/standards.

VladimirAlexiev commented 2 years ago

More defects of the https://gpc-browser.gs1.org/ search:

mgh128 commented 2 years ago

Hi @VladimirAlexiev

I've already e-mailed and requested per-entity URLs as you suggest. When I get a reply I'll follow up with a request for Linked Data, content-negotiation ( I already hinted at that by asking if the RESTful interface could provide the data in machine-interpretable format as well as human-readable).

Regarding expression of GPC attributes within the GS1 Web vocabulary, we don't have a general-purpose mechanism for expressing "this other property defined elsewhere" with "this other code value defined elsewhere". Of course if we had per-entity URLs for GPC bricks, GPC attributes and GPC attribute values (and for everything above the GPC brick in the GCP hierarchy) then you could just use those as Linked Data properties, even better if they were fully supported by online definitions with Linked Data.

Yes, GS1 is aware of a need to merge its various semantic efforts. However, there are some fundamental structural challenges because although there are some equivalent properties in both the GS1 Web vocabulary and the GDSN data model, in many cases the rdfs:domain is different because the GDSN data model uses a number of additional classes in its hierarchy, which appeared to serve no essential functional purpose for repeatable groupings of interdependent properties - so those redundant classes were flattened out / eliminated in the GS1 Web vocabulary, so formally, the rdfs:domain often differs. Other challenges include the tendency for the GDSN data model to use code lists rather than specific named properties. For example, to express a telephone number, in the GS1 Web vocabulary you'd use https://www.gs1.org/voc/telephone (analogous to https://schema.org/telephone ) but in the GDSN data model you'll find 'TELEPHONE' not as a named property but as a code value within a code list CommunicationChannelCode , which seems a far more clunky way of expressing attributes and values - but apparently there's a tendency to take this approach because (from what I heard recently) the process makes it easier or quicker to add an extra value to an existing code list than to introduce a new property (even if introducing a new property would actually be the right thing to do).
Then there's understandable reluctance to make disruptive changes to the GS1 Web vocabulary or the GDSN data model, so unless you have any better ideas, it's likely that the best we can manage is a unified model that has sufficient annotations (via path expressions or similar) to serve multiple audiences (GS1 Web vocabulary users, GDSN users), similarly to the approach in EPCIS/CBV ontology where we're using annotations to try to serve a Linked Data audience as well as a JSON-syntax audience.

VladimirAlexiev commented 2 years ago

@mgh128 Sounds like important work for a "cross-model mapping WG". Or does the mandate of "EDI Semantics" WG fit the bill?

mgh128 commented 2 years ago

Hi @VladimirAlexiev Yes, I already noted some of these ideas for potential improvement in the e-mail about the new GPC browser. I already pointed out that a keyword search for China finds Echinacea but doesn't find China (People's Republic Of) I agree that exclusion clauses should be considered - but they're not currently marked or structured separately in the underlying dataset, so at the moment this would depend on parsing text strings (not ideal, especially when the data itself is translated into multiple human languages).
There are further complications with the GPC dataset because the attribute values available for a specific attribute depend on the GPC brick that expresses the attribute, whereas such filtering would not usually be present in a Linked Data vocabulary; all possible values would be available for the attribute and we'd rely on the creator of the data not to use inappropriate values, such as specifying that the colour of the tomato is blue.

The "EDI Semantics" WG will probably need to consider operating at least two concurrent or consecutive subgroups or phases of work. There are some who appear to be only interested in harmonising existing definitions across standards but not yet ready to considering W3C Linked Data standards - and there are others who are primarily interested in helping with the use of W3C Linked Data standards but who aren't so obsessed with unified definitions - and there are people like you who potentially have an interest in both aspects but may prefer to offer best practice guiding principles (like the feedback you provided on the EPCIS/CBV definitions), without actually processing each set of thousands of definitions. However, the cross-model mapping effort appears to be far beyond the current scope.

oldskeptic commented 2 years ago

@VladimirAlexiev @oldskeptic I'm not sure that https://www.gs1.org/voc/consumerProductVariant is really intended to be used to express different varieties of tomatoes, citrus fruit - it might be better if we have a dedicated property within https://www.gs1.org/voc/FruitsVegetables for this purpose.

I'd like to propose an object property like https://www.gs1.org/voc/varietal which is flexible enough to be used for fruit, vegetables and possibly things like Angus beef and leverage whatever vocabulary people are using.