linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs
https://linked.art/
Other
88 stars 12 forks source link

http or https for canonical URIs of Getty, Wiki*, and others? #577

Open beaudet opened 4 months ago

beaudet commented 4 months ago

We say here: https://linked.art/api/1.0/protocol/

that https is the preferred protocol for Linked Art implementations and presumably that also means for the canonical URIs of published entities.

Getty is still reporting http:// at the top of their concept pages, e.g.: http://vocab.getty.edu/aat/300311458

Will referencing the AAT with https:// create problems with linking up data sets due to the URI's scheme differences? I see that Yale is using http for both Getty and Wikidata. Wikimedia, by the way, might have officially switched theirs to https. So, I guess my question for those familiar with processing data sets from multiple institutions is whether this is a problem that is so common that it must be solved by any system consuming linked data or should implementations pay close attention to the examples given by the authorities when those examples are present? Or maybe so long as an http -> https redirect is in place at the authority, either will work?

azaroth42 commented 4 months ago

Yes to Linked Art implementations for https if at all possible. At this point in the evolution of the web, I think it's borderline irresponsible to not use HTTPS.

The URI (as opposed to URL) of existing instances and ontological terms however is important to be consistent, otherwise the graph doesn't connect properly and applications relying on the exact URI don't process things as expected.

The namespaces for instances we might refer to that I believe are correct, and please correct if not:

The issue that would come up is if you dereference an http URI in a browser via XHR/fetch from within an environment that is served via HTTPS, you get the mixed active content error. This comes up in IIIF relatively often when some organizations serve content via HTTP and others via HTTPS, and then Mirador via HTTPS won't load the HTTP manifest.

Docs from Mozilla on mixed content are very good: https://developer.mozilla.org/en-US/docs/Web/Security/Mixed_content

beaudet commented 4 months ago

I think wikidata recommends https but I think that's from a security perspective, not necessarily a change in the scheme of canonical URIs.

It looks like RDK's permalink's use https

https://rkd.nl/artists/10024

azaroth42 commented 4 months ago

Wikidata asserts their namespace as:

@prefix wd: <http://www.wikidata.org/entity/> .

in the RDF serializations.

e.g. (and I don't recommend this as it's LONG ... don't say I didn't warn you ...)

curl -L -H "Accept: text/n3" http://www.wikidata.org/entity/Q42

(You can | head -35 to grab the prefixes off the top)

edwardanderson commented 4 months ago

It seems like the "Semantic View" URIs for AAT resources discovered through search has recently (?) switched from http to https, but these then point at the http one.

azaroth42 commented 4 months ago

@edwardanderson True, but in the representation it's still http. Eg: https://vocab.getty.edu/aat/300194222.jsonld The subject of this representation served via https is http://vocab.getty.edu/aat/300194222

It's a pervasive and super frustrating issue :( And compounded by people using the human view rather than the canonical URI in the data (e.g. aat/page/300194222 instead of aat/300194222)

In terms of "solve at the right level" this is a data concern, and we could create some helpful tooling around it to fix URIs in bad data (e.g. consider: https://github.com/project-lux/data-pipeline/blob/main/pipeline/config.py#L168-L234 ) and document what we expect ... but data is what it is, and all we can do is whack the moles when they show up.

workergnome commented 4 months ago

And here at Getty we're aware of this, and as we work to improve the Vocabs over the next year or so we're trying to think of what we can do to solve the UX issues here—as Rob said, the conflict between Cool URLs and changes in browser/internet security practices over the past decade are a thorny issue.

azaroth42 commented 4 months ago

@workergnome Let's chat next week -- and we could have that chat in public (again) too if you want, given the topic of the meeting :)

beaudet commented 4 months ago

sounds informative. where do I get tickets?