gs1 / WebVoc

GS1 Web vocabulary development site
Apache License 2.0
30 stars 6 forks source link

question: why not reference existing ontologies for use with gs1:AllergenTypeCode #32

Open justin2004 opened 3 years ago

justin2004 commented 3 years ago

I see that gs1 has a way to indicate when Peas act as an allergen:

gs1:AllergenTypeCode-PEAS a gs1:AllergenTypeCode ;
    rdfs:label "Peas and Pea Products"@en ;
    rdfs:comment "Refers to the presence of peas and pea products as listed in the regulations specified in AllergenSpecificationAgency and AllergenSpecificationName."@en ;
    skos:prefLabel "PEAS" ;
    gs1:originalCodeValue "NE" .

But what about lentil, black eyed peas,kale, mustard greens, etc.

It doesn't seem to be practical for a specific reference to Peas to live in gs1. It also doesn't seem to be practical for all the other plants I mentioned to live in gs1 so why put some in gs1 arbitrarily?

Instead why not reference an ontology that specializes in food such as FOODON?

e.g Peas == http://purl.obolibrary.org/obo/NCBITaxon_3888 kale == http://purl.obolibrary.org/obo/FOODON_03411281

If that was done then gs1 could just have the single class gs1:AllergenTypeCode and use a single object property to indicate the allergen (which is defined in a food ontology).

It seems like the inclusion of any specific food at all commits gs1 to being in the food taxonomy business (and it will never be as good as a domain specific ontology for food).

mgh128 commented 3 years ago

Hi @justin2004

Thanks for the feedback. The GS1 Web vocabulary was updated a few months ago to ensure that its code list gs1:AllergenTypeCode was (at the time) up-to-date relative to the corresponding allergen code used within the GS1 ( see http://apps.gs1.org/GDD/Pages/clDetails.aspx?semanticURN=urn:gs1:gdd:cl:AllergenTypeCode&release=1 ) but I see that there have been some further recent additions on the GDD/GDSN side since then, so a further update is probably needed. I also notice that the current list of allergen codes does not include lentils, black-eyed peas or kale, so I'll ask my colleagues who manage that code list throughout GS1 whether those should be added.

I don't determine which code values belong in that code list - I focus more on making sure that it's available as Linked Data and support the technical updates and tools for the GS1 Web vocabulary (such as the online browser at https://gs1.org/voc )

I'm sure that you already noticed that there are some values in that code list that are not what we think of as a foodstuff - I've never knowingly been offered a side serving of 1,3-BIS-(2,4-DIAMINOPHENOXY)PROPANE - but I was probably just not paying attention to the small print in the menu at the time. It's also possible that some of these allergens may be listed for non-food products such as cosmetics, healthcare products etc. since the GS1 community includes multiple industry sectors - not only food but healthcare, technical industries, general merchandise, transport and logistics etc.

There's technically an opportunity for GS1 to cross-reference related terms in the Food Ontology or other Linked Data ontologies - but unless there's an easy way to automate the compilation of those cross-references, I'm not sure whether GS1 will prioritise that in the near term.

I hope that provides some background about why gs1:AllergenTypeCode is the way it is currently. The GS1 Web vocabulary is GS1's attempt to make more of its data model and code lists available and expressible as Linked Open Data, so that anyone (e.g. a manufacturer or retailer) can describe products etc. in greater detail than using schema.org alone.

I'll pass on your feedback to my colleagues concerned with the GDSN/GDD data model and code lists, since it might also help to make the allergen code list more complete.

oldskeptic commented 3 years ago

...remembering the past conversation of #6 about ingredients and labels.

Is there any implementation guidance on how a pure GS1 web voc application would consume a GS1 web document with a missing code and additional data? It's unreasonable to expect that the code table can keep up with every single corner case (eg: Tomato allergies are missing). If there was a documented expected behaviour "Show the node label to the user when the gs1:AllergenDetails node is missing data" it would permit greater interoperability with other vocabularies and ensure "fail safe".

justin2004 commented 3 years ago

Hi @mgh128

The GS1 Web vocabulary is GS1's attempt to make more of its data model and code lists available and expressible as Linked Open Data, so that anyone (e.g. a manufacturer or retailer) can describe products etc. in greater detail than using schema.org alone.

It is an excellent goal but I worry that trying to copy in a subset of a domain (such as food products) is a losing battle. The loss will happen when someone tries to use gs1 (web voc) to indicate that a product has the potential to be allergenic due to the presence of, say, kale (or anything not already copied into gs1). If I ran into that situation I would think "maybe gs1 isn't the right vocabulary" and I would find a way to use other ontologies together.

There's technically an opportunity for GS1 to cross-reference related terms in the Food Ontology or other Linked Data ontologies - but unless there's an easy way to automate the compilation of those cross-references, I'm not sure whether GS1 will prioritise that in the near term.

I don't think the team needs to automate it. I think they could just do the upstream thing that allows the Web Voc team to do some T-Box representation that allows a downstream user to do something like:

:someProduct10 a gs1:FoodBeverageTobaccoProduct ;
           gs1:hasAllergen [ ex:allergenicAgent <http://purl.obolibrary.org/obo/FOODON_03411281> ;
                                        gs1:allergenLevelOfContainmentCode gs1:LevelOfContainmentCode-CONTAINS ]

Kale and all the other plants I mentioned above are instances of http://purl.obolibrary.org/obo/FOODON_03411564 ("food product organismal source).

I also notice that the current list of allergen codes does not include lentils, black-eyed peas or kale, so I'll ask my colleagues who manage that code list throughout GS1 whether those should be added.

Then I could just keep coming back with food products that are in FOODON but not in gs1.

mgh128 commented 3 years ago

I should probably also clarify that:

  1. any publisher of Linked Data is at liberty to combine any Linked Data vocabularies that they deem appropriate, e.g. schema.org + GS1 Web vocabulary + Food ontology + anything else that they might discover, e.g. using Linked Open Vocabularies ( https://lov.linkeddata.es/dataset/lov/ ). GS1 certainly does not impose any validation rules on use of its GS1 Web vocabulary - so if you want to express an additional triple that asserts that : https://example.com/01/01234567890128 ex:hasIngredient http://purl.obolibrary.org/obo/NCBITaxon_3888 . then you won't run into any problems with the non-existent GS1 Web vocabulary police.
  2. The code values within gs1:AllergenTypeCode are typically those required to be specified by various food safety regulations around the world and as far as I am aware, GS1 does not claim that this is an exhaustive list of all possible allergens, food ingredients, chemical compounds etc.
  3. The licence for the GS1 Web vocabulary (embedded within the JSON-LD or Turtle files and under the 'Licence' tab at https://gs1.org/voc ) clearly states that it is made available on an "as is" basis, so while we welcome feedback and suggestions for improvement, it's up to GS1 to decide which improvements it wants to make and what to prioritise. Here is a partial extract from the licence statement:
    THIS DOCUMENT IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGMENT, FITNESS FOR PARTICULAR PURPOSE, OR ANY WARRANTY OTHER WISE ARISING OUT OF THIS SPECIFICATION. GS1 disclaims all liability for any damages arising from use or misuse of this vocabulary, whether special, indirect, consequential, or compensatory damages, and including liability for infringement of any intellectual property rights, relating to use of information in or reliance upon this document. GS1 retains the right to make changes to this vocabulary at any time, without notice. GS1 makes no warranty for the use of this vocabulary and assumes no responsibility for any errors which may appear in the vocabulary, nor does it make a commitment to update the information contained herein. GS1 and the GS1 logo are registered trademarks of GS1 AISBL.
justin2004 commented 3 years ago

then you won't run into any problems with the non-existent GS1 Web vocabulary police.

:) I didn't expect I would!

It just always stands out to me when on ontology starts to "copy" (intentionally or unintentionally) in a subset of a more comprehensive vocabulary (in this case it is that gs1 web voc "copied" a subset of FOODON unintentionally). It seems like it would benefit all linked data users if, when that situation was noticed, the ontologies would instead cooperate explicitly.

Or maybe the rule of thumb might be that the newcomer cooperates with the existing ontology with no action needed from the existing ontology. If the newcomer doesn't cooperate/integrate with existing ontologies (for domains it cares about) then things aren't linked as well as they could be.

VladimirAlexiev commented 3 years ago

I agree with @justin2004 here.

@mgh128 the question is not which GS1 committee has begot a partial list of allergens. The question is why GS1 does not reuse more, and does not recommend to it's own users to reuse external resources. Actually I'll pose the question to @philarcher who of all people I expect is the staunchest supporter of reusing LOD.

I have some examples from our EPCIS 2 effort:

VladimirAlexiev commented 3 years ago

But @justin2004 I have to ask you whether FOODON is really the best authority on allergens.

Eg do they have all the Ennn additives?

Please check OpenFoodFacts. You can start with the OpenFoodFacts properties on Wikidata, I think 2-3 are fully captured in Wikidata.


Or actually....

@philarcher , the schema.org people recommended their users to use Wikidata as a source of individuals. I think you respect schema.org a lot, why can't gs1 Voc use a similar recommendation?

VladimirAlexiev commented 3 years ago

From an old issue https://github.com/gs1/WebVoc/issues/6#issuecomment-791554165

there are hundreds of thousands of possible ingredients, so it would be very difficult for us (GS1) to compile and maintain - so I think the best we can suggest is that you use ... https://www.gs1.org/voc/ingredientName

No, the best would be to recommend to gs1 users to reuse well developed external resources: WD, OFF, FOODON, USDA Food Central, LanguaL, FoodEx or something else that I'm not aware of.

mgh128 commented 3 years ago

I agree with @justin2004 here.

@mgh128 the question is not which GS1 committee has begot a partial list of allergens. The question is why GS1 does not reuse more, and does not recommend to it's own users to reuse external resources. Actually I'll pose the question to @philarcher who of all people I expect is the staunchest supporter of reusing LOD.

I have some examples from our EPCIS 2 effort:

  • positive: microorganism and chemicalSubstance are recommended to be reused from some biological and chemical ontologies respectively

    • happily, GS1 did not think of itself as an authority on these matters, nor referred to some national documents, but deferred to communities who are real authorities on these matters
    • also importantly, these props are left without range, enabling friction-less reuse
  • negative: GS1 copied 50-something units from QUDT to class gs1:MeasurementType

    • the argument was an amount of FUD re the longevity and stability of QUDT
    • but we forgot Dimensionless units, so then added just one, skipping the whole complexity of logarithmic (eg decibels) vs count (eg of RBC vs cigarettes vs master bixes) vs percent (eg interest rate)
    • oh well, UNECE REC20 (which EPCIS uses for units) are also just starting on their LOD journey...

GS1 didn't copy 50-something units. Those things proposed to be present within gs1:MeasurementType are measurement types / physical properties, not the individual units such as kilogram, metre, pascal.
@VladimirAlexiev - I assume that you do understand the difference between units and measurement types ;-)

As for dimensionless units, as you well know, the group did discuss logarithmic, count, percent and ratio - but decided at this stage to only include a broad dimensionless type - but yet again you choose to misrepresent what happened because you preferred a different outcome, even though I agree with you (and even mentioned in the group discussion that these different kinds of dimensionless units are not interconvertible - only within each dimensionless subcategory (e.g. you can't convert a dimensionless count to a dimensionless ratio or to a dimensionless logarithmic unit).

Regarding allergens, as I've already explained, GS1 monitors food safety legislation around the world and seeks to ensure that those allergens that are required to be declared by legislation / regulations are represented in the GDSN / GDD AllergenTypeCode list. A recent work request was made (not by me) to update the GS1 Web vocabulary to align with that code list. As you well know, anyone is free to mix and match Linked Data vocabularies if they do not find everything they need in one vocabulary - so the current situation is not harming anyone even if you or @justin2004 consider it suboptimal. If you don't like the way the GS1 Web vocabulary is, feel free to submit a work request to suggest a change. https://wr.gs1.org/

justin2004 commented 3 years ago

GS1 monitors food safety legislation around the world and seeks to ensure that those allergens that are required to be declared by legislation / regulations are represented in the GDSN / GDD AllergenTypeCode list.

@mgh128 That GS1 does that is great and expected. I think the mistake arises when GS1 Web Voc just makes an individual, e.g. gs1:AllergenTypeCode-PECAN_NUTS, in a vacuum.

GS1 Web Voc doesn't acknowledge that linked data existed before it: https://www.wikidata.org/wiki/Q1119911 http://purl.obolibrary.org/obo/FOODON_03315232 etc.

And because of that I don't think of GS1 Web Voc as a participant in linked data because it requires that someone else come along and say something like:

gs1:AllergenTypeCode-PECAN_NUTS ex:allergenicAgent wd:Q1119911 .

Why wouldn't GS1 Web Voc take what GS1 specifies and lift it into the world of linked data? Isn't that what it is supposed to do?

But @justin2004 I have to ask you whether FOODON is really the best authority on allergens.

@VladimirAlexiev I don't know enough about allergens to have an opinion. I just wanted to get the conversation started.

justin2004 commented 3 years ago

If you don't like the way the GS1 Web vocabulary is, feel free to submit a work request to suggest a change. https://wr.gs1.org/

I sent an email to the "Contact Us" address.

Maybe a concrete proposal would be useful.

Instead of just putting this in WebVoc:

gs1:AllergenTypeCode-PECAN_NUTS a gs1:AllergenTypeCode ;
    rdfs:label "Pecan Nut and Pecan Nut"@en ;
    rdfs:comment "Refers to the presence of pecan nut and pecan nut products as listed in the regulations specified in AllergenSpecificationAgency and AllergenSpecificationName."@en ;
    skos:prefLabel "PECAN_NUTS" ;
    gs1:originalCodeValue "SP" .

Why not put this in:

gs1:AllergenTypeCode-PECAN_NUTS a gs1:AllergenTypeCode ;
    rdfs:label "Pecan Nut and Pecan Nut"@en ;
    rdfs:comment "Refers to the presence of pecan nut and pecan nut products as listed in the regulations specified in AllergenSpecificationAgency and AllergenSpecificationName."@en ;
    skos:prefLabel "PECAN_NUTS" ;
    gs1:originalCodeValue "SP" ;
    ex:allergenicAgent wd:Q1119911 .

The last triple would make quite a difference to anyone that wants to write interesting queries.

mgh128 commented 3 years ago

Hi @justin2004

Thanks for the example. I didn't have any doubt about how we could technically add such cross-references. I also don't disagree that from a Linked Data perspective, this could be a good thing to do - but I'm not going to spent any time voluntarily determining those cross-references for each of our existing allergen codes. I do take your point that if GS1 decided that it was appropriate to link to Wikidata resources, then those Wikidata resources might be a sufficient 'hub' to link to other related terms elsewhere.

Unfortunately, there are many other things within GS1 that are currently much higher priority - at least in terms of the technical work that @philarcher or I do for GS1 - and it's unlikely that anyone else within GS1 (except perhaps an intern student with some Linked Data experience) would do this work for GS1.

GS1 appears to have no problem in referring to anything published by any de jure standards organisation such as ISO, W3C, IETF, UN CEFACT, ANSI, etc. or publications from government agencies and regulators - but to reference anything else, GS1 senior management (that's not Phil or me) would need to investigate the credibility and longevity/persistence of those other referenced datasets / resources and understand their procedures for publication, review and change management. That's not a question of FUD (as claimed by @VladimirAlexiev ) - it's just about trying to ensure that anything we do reference will still be accessible and maintained well into the future.

justin2004 commented 3 years ago

Hi @mgh128

I also don't disagree that from a Linked Data perspective, this could be a good thing to do

I do take your point that if GS1 decided that it was appropriate to link to Wikidata resources, then those Wikidata resources might be a sufficient 'hub' to link to other related terms elsewhere.

Now I am 100% sure we aren't talking past each other! :)

GS1 appears to have no problem in referring to anything published by any de jure standards organisation

Wikidata seems to be more comprehensive than any other linked data hub that I've seen but I'm not sure it will ever have the status of a de jure standards organization (because of its open nature).

What if instead we picked a property relating to a source system that might be a better match to "de jure standards organization" such as USDA NDB number?

e.g. Pecan Nuts are NDB number 12142 https://fdc.nal.usda.gov/fdc-app.html#/food-details/170182/nutrients

The USDA is pretty reputable and I wouldn't bet against its longevity.

Can I modify my proposal to be:

gs1:AllergenTypeCode-PECAN_NUTS a gs1:AllergenTypeCode ;
    rdfs:label "Pecan Nut and Pecan Nut"@en ;
    rdfs:comment "Refers to the presence of pecan nut and pecan nut products as listed in the regulations specified in AllergenSpecificationAgency and AllergenSpecificationName."@en ;
    skos:prefLabel "PECAN_NUTS" ;
    gs1:originalCodeValue "SP" ;
    ex:allergenicAgentNDB "12142" .

Then GS1 only has to recognize the USDA but we incidentally get Wikidata hub connectivity!

oldskeptic commented 2 years ago

@mgh128 Following up on this, gs1voc is in both owl and rdfs world.

In an "Ideal OWL World", pretty much everything discussed above can be fixed with additional owl statements from the publisher. In reality, I'm very sure that most applications only consider rdfs and that's if they aren't naively parsing webvoc / schema.org as json strings. This means that much is left to application interpretation.

I'm not asking for a "GS1 Web vocabulary police", but writing down a few paragraphs of implementation guidance now would avoid having to maintain documents like Jarno's spreadsheet of search engine product variant interpretations.

In the interest of interoperability, how do you see publishers handling ingredients, allergens and packaging that aren't (for whatever sane reason) in the gs1 webvoc?

VladimirAlexiev commented 1 year ago

@oldskeptic @mgh128 @justin2004 @philarcher

image

VladimirAlexiev commented 1 year ago

I'm wrong, 9003 as well as a bunch of other variants of apple exists as "SR legacy foods": image

The problem was:

VladimirAlexiev commented 1 year ago

@justin2004 please see https://www.wikidata.org/wiki/Property_talk:P1978#FDC_ID

Wikidata has these props linking to important food databases:

mgh128 commented 1 year ago

Hi @VladimirAlexiev, @philarcher

Thanks for the reminder about this. I agree that there is a potential opportunity to reference related resources elsewhere. For example, if you look within the GPC browser ( https://gpc-browser.gs1.org/ ) and navigate through:

you can find Attribute 20002794 Apple Variety with attribute values that could map to the NDB Numbers / URLs in your previous comment, e.g.

Unfortunately, the GPC dataset does not yet appear to be available as Linked Data, nor does the current GCP browser provide direct URLs to a GPC brick such as 10005900 (for Apples) - but if we did all of that, then it would make sense to cross-reference to these resources elsewhere.

A few years ago, I did develop for GS1 a prototype Linked Data representation of the GPC dataset and a browser tool (see for example https://mh1.eu/gpctest/10005900 ) but I'm not sure what the current or future plans are for publishing the GPC dataset as Linked Data but we should investigate and see if it can be reactivated. One of the challenges is that the dataset can be quite large because of the descriptions (including multilingual translations in many languages), so we might need to think about not only publishing it in its entirety but in appropriate fragments that can be loaded in a lazy manner for a GPC browser.

VladimirAlexiev commented 1 year ago

@mgh128 Yes, I know about gpctest and we used it as formatterURL (external site) in wikidata: GS1 GPC code (P8957). Thanks for keeping it up!

Could you also make it resolve attribute values, eg:

On the original issue:

mgh128 commented 1 year ago

The resources at https://mh1.eu/gpctest/10005900 etc. are only a prototype based on a snapshot of the GPC dataset as it was at the time it was developed. We could do a wholesale update to use the current dataset but I don't have the bandwidth to do incremental updates of individual GPC bricks, attributes or attribute values. However, there is a possibility that some GPC values previously assigned in earlier versions of the GPC dataset might have been deprecated or even removed, though they do maintain multiple historic versions. Ideally we'll get to a situation where the GPC dataset is available as Linked Data hosted at ref.gs1.org with an accompanying tool to make it more easily searchable / browsable - and at that point, I'd hope that you'd still be able to reconfigure the wikidata triples to point to the more stable resources maintained at ref.gs1.org, rather than my prototype based on a somewhat out-of-date dataset. When the corresponding resource is available at ref.gs1.org I'll set up URL redirects to that as the authoritative resource.

I do encourage you to consider participating in the new GS1 Web Technology SMG (see https://www.gs1.org/standards/development-work-groups ), in which @philarcher and I will be actively involved in various updates that relate to the GS1 Web vocabulary and other work requests making GS1 standards suitable for the Web. I hope that some of the topics you have mentioned today on GitHub are already on the radar for being made available as Web-friendly / Linked Data resources, hosted at ref.gs1.org , though I can't comment on the likely timelines for each of those.

mgh128 commented 1 year ago

Regarding references to external resources, I don't think @philarcher is at all opposed to this. The challenge is that the work of doing the mappings to those external resources really needs to be pushed upstream to the experts in the GPC dataset, so that they see value in adding the appropriate mappings in their work and in maintaining those, as the GPC dataset evolves over time. Then, their work requests to the Web Technology SMG at GS1 would already include those mappings in a spreadsheet, so that we can then easily include those within the GS1 Web vocabulary, rather than it being a task for @philarcher and me to work those out manually, one mapping at a time - that is not best use of our time, especially when ( I think you'd agree ) there are so many other aspects that need to be Web-enabled / made more available as Linked Data. There might be some back-and-forth and we might be able to support them with software tools (or outputs of those) that make a first attempt at determining the mappings automatically, but we'd really need the GPC team to review and refine those, then send us back what they consider to be the most appropriate mappings.

VladimirAlexiev commented 1 year ago

@mgh128 @philarcher I considered participating in the Web Technology SMG. I enlisted for that session at the past GS1 Standards Week. But when I learned that I need to sign paperwork just to attend an information session, I gave up. IMHO GS1 should reconsider its bureaucracy policies if it wants to attract more volunteer contributors.

mgh128 commented 1 year ago

Hi @VladimirAlexiev All GS1 mission-specific work groups and standards maintenance groups require signing of the GS1 IP Policy and opt-in to the groups in which you would like to participate. Your company already signed the IP Policy when you participated in the EPCIS MSWG. If you decide to join the group, you're still welcome to do so at any point in the future. More info at https://www.gs1.org/standards/development-work-groups