w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
34 stars 35 forks source link

review used schema.org terms for existence and relevance #271

Closed VladimirAlexiev closed 1 year ago

VladimirAlexiev commented 2 years ago

The first example in #270 has two more problems::

OR13 commented 2 years ago

TODO: find a better IRI for Purchase... or define it here.

OR13 commented 2 years ago

https://github.com/w3c-ccg/traceability-vocab/search?q=Purchase&type=code

Seems stale, should be closed.

nissimsan commented 2 years ago

@VladimirAlexiev , this is defined by us, not schema.org. https://github.com/w3c-ccg/traceability-vocab/issues/271 Closing.

VladimirAlexiev commented 2 years ago

Let me quote from https://github.com/w3c-ccg/traceability-vocab/blob/main/docs/openapi/components/schemas/common/BillOfLading.yml#L38

    $linkedData:
      term: relatedDocuments
      '@id': https://schema.org/Purchase

@nissimsan please reopen

OR13 commented 2 years ago

No, do not reopen... create a new issue with a clear and actionable description something like:

title: Update broken IRIs in BillOfLading RDF Class description: This RDF Class contains links to terms that are incorrect (include examples).

VladimirAlexiev commented 2 years ago

@OR13 Please read the issue title. I gave just one example. Here's an idea: grep all schema.org terms from all schemas, and try to resolve those URLs to confirm their existence. Reminder: schema.org semantic URLs use http not https, even though the schema.org site redirects to https.

nissimsan commented 2 years ago

@VladimirAlexiev, can you elaborate on the latter, pls? It sure looks like https:
image

VladimirAlexiev commented 2 years ago

@OR13 I guess they changed it recently, despite objections about changing what are supposed to be permanent URLs.

But could you please check in the ontology (JSONLD or Turtle) to make sure?

TallTed commented 2 years ago

Apparently, the powers that be at schema.org haven't read the CoolURIs article, never mind that @danbri has been involved in the worlds of Linked Data and semantic webs nigh unto forever....

FWIW, generally, if not universally, http:// URIs for schema.org redirect to https:// when dereferenced, and while the latest revisions of schema.org do use https://, anyone who's made significant use of their vocab may find it a significant hurdle to change all instances of http://.

There's nothing wrong with http:// being in the identifier URI, and https:// being the way you get the description of that http://-identified entity. For good or ill, there's no way to make a universal RDF statement that "all https:// URIs are owl:sameAs all http:// URIs", but schema.org (and others running into similar issues) could relatively easily include such a declaration on each term in their vocab document. Maybe if enough different schema.org users whine about this, they'll do it.

nissimsan commented 2 years ago

image https:// alright.

danbri commented 2 years ago

re

Maybe if enough different schema.org users whine about this, they'll do it.

We would respond to data consumers saying they'd use it.

However it is not clear what property to use to associate non-type, non-property terms, e.g.

http://schema.org/AudiobookFormat and https://schema.org/AudiobookFormat

I am not convinced owl:sameAs works, as it is such as strong claim.

nissimsan commented 2 years ago

Noting that a http -> https redirect still breaks the Verifiable Credential proof.

danbri commented 2 years ago

If you want verifiable-credentials level assurance, you probably should be using https: throughout (and avoid remote context URLs https://docs.google.com/document/d/1Jo4-dTDo1osL3Mr9-THqwKywWu-NiFGagFTQV5mr-N8/edit# )

On Thu, 25 Aug 2022 at 09:54, Nis Jespersen @.***> wrote:

Noting that a http -> https redirect still breaks the Verifiable Credential proof.

— Reply to this email directly, view it on GitHub https://github.com/w3c-ccg/traceability-vocab/issues/271#issuecomment-1226975309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGLTOYZ3X3EEPJEF2RDV24YE3ANCNFSM5MYMP27Q . You are receiving this because you were mentioned.Message ID: @.***>

nissimsan commented 2 years ago

Yes, agreed, @danbri.

More generically, I consistently stick with whatever I get redirect to - http or https, cool or uncool. Simple and safe.

TallTed commented 2 years ago

@nissimsan -- Though it may appear to be both, "[sticking] with whatever [you] get redirect [sic] to" is NEITHER simple nor safe.

Browser redirection from the URI of a term being described, to the URI of the description of that term, does not imply in any way that the term being described is identified by the URI of the description of that term!

TallTed commented 2 years ago

[@nissimsan] Noting that a http -> https redirect still breaks the Verifiable Credential proof.

The identifier of an entity being in the http URI scheme should not break any proof, even if the identifier of the description of that entity (which may be reached by dereferencing the identifier of that entity) is identified by a URI in the https URI scheme!

danbri commented 2 years ago

There are also the various kinds of HTTP redirection available. FWIW Schema.org's http: to https: redirections use "HTTP/1.1 301 Moved Permanently".

TallTed commented 2 years ago

@danbri --

However it is not clear what property to use to associate non-type, non-property terms, e.g.

http://schema.org/AudiobookFormat and https://schema.org/AudiobookFormat

I am not convinced owl:sameAs works, as it is such as strong claim.

Are you saying that "you" (schema.org) mean (or meant) to identify two different entities by those two URIs (or really, any two URIs which differ only in their scheme, http vs https; note please I'm not talking about any two schemes, where I agree, it's more complex)?

Or is the latter simply a newer identifier (a co-referer) for the same entity, as would communicated by putting an owl:sameAs relation (in either direction, i.e., in either description, though optimally it would be in both) between them?

TallTed commented 2 years ago

[@danbri] FWIW Schema.org's http: to https: redirections use "HTTP/1.1 301 Moved Permanently"

That would seem to support my assertion that owl:sameAs is the intended relation between the http: and https: identifiers, directed from the former to the latter, though as a reflexive, symmetric, and transitive relation, it could be stated in either direction.

danbri commented 2 years ago

On Thu, 25 Aug 2022 at 16:58, Ted Thibodeau Jr @.***> wrote:

@danbri https://github.com/danbri --

However it is not clear what property to use to associate non-type, non-property terms, e.g.

http://schema.org/AudiobookFormat and https://schema.org/AudiobookFormat

I am not convinced owl:sameAs works, as it is such as strong claim.

Are you saying that "you" (schema.org) mean (or meant) to identify two different entities by those two URIs (or really, any two URIs which differ only in their scheme, http vs https; note please I'm not talking about any two schemes, where I agree, it's more complex)?

Or is the latter simply a newer identifier (a co-referer) for the same entity, as would communicated by putting an owl:sameAs relation (in either direction, i.e., in either description, though optimally it would be in both) between them?

I am saying you can go either way on this stuff.

When Dublin Core moved from http://purl.org/dc/Creator to http://purl.org/dc/terms/creator (or similar), ... were those two very similar properties, or essentially the same one. There is no best practice on this really.

I lean towards not spraying owl:sameAs around everywhere since it becomes then to say different things about each of them, e.g.

http://schema.org/Person owl:sameAs https://schema.org/ means we can't usefully say http://schema.org/Person schema:supersededBy https://schema.org/

For me it seems more useful to state which one is the currently preferred one, rather than just to say they identify the same thing. It is good to nudge things away from http:, unless something radical https://www.w3.org/DesignIssues/Security-NotTheS.html happens.

TallTed commented 2 years ago

@danbri

(Not treating your last in order....)

Well, I would hope you wouldn't spray these anywhere meant to be consumed as RDF --

http://schema.org/Person owl:sameAs https://schema.org/ means we can't usefully say http://schema.org/Person schema:supersededBy https://schema.org/

First, this makes no sense to me, as written --

http://schema.org/Person owl:sameAs https://schema.org/

-- though perhaps you meant to write --

<http://schema.org/Person> owl:sameAs <https://schema.org/Person>

Likewise, perhaps you meant this --

http://schema.org/Person schema:supersededBy https://schema.org/

-- to be written --

<http://schema.org/Person> schema:supersededBy <https://schema.org/Person>

Now, if you are concerned about versioning -- whether of http://schema.org/ (or https://schema.org/) writ large (i.e., all terms therein), or of https://schema.org/Person writ small -- I think that has some validity.

I think that validity is best addressed by identifying the "old" description which was superseded with some versioned URI which is linked from the "new" description which is identified by some other versioned URI.

Dereferencing the un-versioned URI should, in my opinion, always lead to the latest/current description, which should include a link to at least the most-recent previous description (recursive, so each description links to the next-most-recent), if not to all previous descriptions.

Dereferencing any versioned URI should lead to that version of the description, which optimally would include links to both more and less recent descriptions.

Your example of --

Dublin Core moved from http://purl.org/dc/Creator to http://purl.org/dc/terms/creator (or similar) -- seems to me rather significantly different than migrating schema.org from the HTTP (unencrypted end-to-end) protocol to the HTTPS (encrypted end-to-end) protocol, whether or not the URIs that are used to identify entities are left as http:-scheme or migrated to https:-scheme.

It is unfortunate that many treat the URI of the HTML or other rendition of a description of a dereferenced URI (to which they may be routed by various 3xx and other means) as if it were the URI of the entity identified by the initial URI. Rather, the URI of the entity identified by the initial URI should be included within the HTML or other rendition of a description of a dereferenced URI which description should optimally be identified by its own URI, but all too often is not.

VladimirAlexiev commented 2 years ago

The switch-over was done in 12.0 on 2021-03-08 (see https://schema.org/docs/releases.html). I bitched about it for a long time, @danbri allowed a reasonable amount of discussion, and obviously schema.org won't go back to http, and there are good reasons to modernize to https. @OR13 Are there significant numbers of existing VCs that are broken by this switch-over? I think that all new VCs should use https for schema.org, and possibly even for other ontologies (and ask the creators of those other ontologies what do they think about a switch).

So I suggest not to sidetrack this issue with that other issue.

grep all schema.org terms from all Traceability schemas, and try to resolve those URLs to confirm their existence.

@OR13 what do you think?

OR13 commented 2 years ago

This isn't the schema.org repo... AFAIK, we updated all the references to use https... so that we don't have any issues with this.... I prefer more, smaller, more actionable issues... rather than issues that are "review all links"... I welcome a separate issue per discovered broken link.

danbri commented 2 years ago

FWIW the last schema.org release contained this file, https://github.com/schemaorg/schemaorg/blob/main/data/releases/14.0/httpequivs.ttl which asserts owl:equivalentClass and owl:equivalentProperty relationship between http: and https: term URIs. I don't think it has a treatment for Enumeration members. /cc @rjw

p.s. yes sorry I had a typo in my earlier response to @TallTed

And +1 to @OR13 re actionable issues.

TallTed commented 2 years ago

https://github.com/schemaorg/schemaorg/blob/main/data/releases/14.0/httpequivs.ttl which asserts owl:equivalentClass and owl:equivalentProperty relationship between http: and https: term URIs

Hallelujah! Glad to hear it! (Hope I don't forget it!)

VladimirAlexiev commented 2 years ago

@OR13 Please reopen this: it's a task to grep all schema.org terms and check them for existence in schema.org. Do you expect me to find all cases? I've started posting specific cases, but can't someone else share in this work?

Here's a first cut, collapsing cases already reported in other issues:

grep -hr 'http.*schema.org' .|perl -pe "s{^ +}{}; s{ '}{}; s{'$}{}; s{https://https://}{https://}; s{www.}{}"| sort|uniq >schema.txt

Attached: schema.txt

nissimsan commented 2 years ago

@VladimirAlexiev - reopening. The floor is yours... :)

VladimirAlexiev commented 2 years ago

@nissimsan Isn't anyone going to help with the list I made?

brownoxford commented 1 year ago

Discussed on call, please review and indicate whether you are able to assist on this ticket.

OR13 commented 1 year ago

I suggest running the script and filing separate issues, and then closing this issue.

Issues that are large and not actionable tend to not progress well.

nissimsan commented 1 year ago

We do use https://schema.org/Purchase on Bill of Lading. Def. seems like a mistake. I'll remove this.

VladimirAlexiev commented 1 year ago

@nissimsan The attachment shows 161 schema terms in traceability schemas. Will you make a head request for each of these URLs and see whether they resolve?

Here's a mistake: this lowercase term is a prop, so it cannot be used as @type:

'@type': https://schema.org/identifier
BenjaminMoe commented 1 year ago

@VladimirAlexiev can you open a separate issue for this? https://github.com/w3c-ccg/traceability-vocab/issues/271#issuecomment-1602110320