xsd:string vs rdf:langString

VladimirAlexiev commented 3 years ago

EPCIS includes very few strings (4-5), and they are all code-like so they are correctly declared xsd:string

But gs1:certificationStandard, gs1:certificationAgency, gs1:certificationValue in CertificationInfo (#245) are declared rdf:langString (following WebVoc's multilingual orientation). I don't see cardinality restrictions, so this allows providing values in different langs, eg

"certificationInfo": {
  "gs1:certificationStandard": [
    {"@language": "en", "@value": "Food safety standard"},
    {"@language": "bg", "@value": "Стандард за безопасност на храните"}
  ]
}

Questions:

Do we expect people to use keys "@language", "@value" in JSON
Or do we want to define aliases in the context eg "lang": "@language" (but "value" already maps to epcis:value)
Do we expect the gs1 WebVoc context to be included by the user in JSONLD (#263)
Maybe it's better to declare the range of these props as rdf:Literal (or schema:rangeIncludes xsd:string, rdf:langString; or a sh:or of these datatypes) to allow the much simpler common case of one name with no lang tag:
```
"certificationInfo": {
"gs1:certificationStandard": "Food safety standard"
}
```
stringValue in SensorReport is defined as xsd:string. Do we ever want it to carry a lang tag to indicate the language?

mgh128 commented 3 years ago

My answers:

As far as possible, we'd like to avoid any JSON-LD @-prefixed keywords in the JSON - so we have tried to use aliases as much as possible. We have not yet defined an extensive JSON-LD context resource for the GS1 Web vocabulary, possibly because we were less concerned with hiding the JSON-LD keywords. So, if we want to hide "@language" and "@value" in JSON, should we consider making use of language maps, as in https://www.w3.org/TR/json-ld11/#example-71-language-map-expressing-a-property-in-three-languages ?
Language maps would avoid the need to define aliases for @language and @value, especially as we already have epcis:value within epcis:SensorReport. Or maybe we should not have used a different term instead of epcis:value, e.g. epcis:doubleValue ?
The gs1 WebVoc context is not yet available - but we could expect EPCIS data that makes use of terms from the GS1 Web vocabulary (for master data or gs1:CertificationDetails) to include the context resource for the GS1 Web vocabulary, although that could end up being quite a large file, even if it can be cached.
rdf:Literal might be too broad, but the logical OR / union of rdf:langString and xsd:string might make more sense.
I don't think we envisaged that a sensor that reports a string value would also provide a language tag/indicator, even if the string were in a language other than English - and without that, we can't correctly write it as an rdf:langString. Need to discuss with the group about the use cases for sensors that provide string values - and whether a language is ever unambiguously specified - or perhaps just hidden somewhere within the deviceMetaData (which could be in any format / structure).

VladimirAlexiev commented 3 years ago

@mgh128 Let's discuss lang tags here, not in #263

3: I wouldn't worry about JSONLD client being able to process and cache the Voc context. But schema.org people worried about a server's ability to serve a big context reliably, despite the caching. I wrote in Voc about that.

4: yes, rdf:Literal is too broad, rdf:PlainLiteral is exactly what's needed. Let's discuss in https://github.com/gs1/WebVoc/issues/22 (where I give 5 options), not here.

5: Ask the group but I'd be happy to stick with xsd:string. Sensors are not poets :-)

2: JSONLD language shortcuts

@mgh128 on "language map": https://github.com/gs1/EPCIS/issues/263#issuecomment-827037877
me on the complications it brings: https://github.com/gs1/EPCIS/issues/263#issuecomment-827767025
JSONLD gives various shortcuts related to lang tags. I think the key question is HOW will this JSON be generated and accessed because we have two counter-forces
- economy of representation, VS
- uniformity of consumption

If economy should be maximized, the best is to use range rdf:PlainLiteral and then use a lang tag when you have one, and don't use any when you don't have it. Assuming aliases "lang":"@language", "val":"@value":

  "code": "code-not-natural-lang",
  "label: {"val": "Some label", "lang": "en"}
  "labels": [
    "Some label, I'm not sure about lang",
    {"val": "Some label", "lang": "en"}
  ]

If uniformity should be maximized, then you want to always use the same structure for the same field, which leads to extraneous adornments:

using @none for strings with no lang tag
using [...] for multivalue fields, even when there's a single value

VladimirAlexiev commented 3 years ago

Agreed approach 2:

Define aliases in the context: {"lang": "@language", "val": "@value"}, and tell the user to express his strings as in the last example (i.e. express lang tag only when he has it)

mgh128 commented 3 years ago

^^^ This applies when we're using master data properties from the GS1 Web vocabulary that expect language-tagged strings. We're not proposing any change to the existing CBV master data attributes / ILMD that only expect a plain xsd:string value. i.e. not breaking backward compatibility - just trying to make it more JSON-friendly if using the GS1 Web vocabulary properties instead.

We'll do this within the narrowly scoped context resources for CertificationDetails and master data attributes - as mentioned in https://github.com/gs1/EPCIS/issues/263#issuecomment-852176222

CraigRe commented 3 years ago

EPCIS § 7.4.1.3 will address certificationInfo, including non-normative examples with langString.

gs1 / EPCIS

xsd:string vs rdf:langString #262