rubensworks / jsonld-context-parser.js

Parses JSON-LD contexts
MIT License
26 stars 9 forks source link

Workaround for schema.org JSON-ld context #77

Closed danielbeeke closed 2 days ago

danielbeeke commented 2 days ago

schema.org runs from static infrastructure as can be read here. Their JSON-ld context is not protocol aware. It will always tell that the prefix schema is http://schema.org.

We use rdf-parse. Rdf-parse delegates to jsonld-context-parser. And jsonld-context-parser downloads the schema.org context. Ultimately our outputted RDF has http://schema.org instead of https://schema.org.

A fix can be done in FetchDocumentLoader:load:

if (mediaType === 'application/ld+json') {
    // Return JSON-LD if proper content type was returned
    const jsonLdContext = (await response.json());

    if (url === 'https://schema.org/docs/jsonldcontext.jsonld' && jsonLdContext['@context']['@vocab'] === 'http://schema.org/') {
        jsonLdContext['@context']['@vocab'] = 'https://schema.org/'
    }

    if (url === 'https://schema.org/docs/jsonldcontext.jsonld' && jsonLdContext['@context']['schema'] === 'http://schema.org/') {
        jsonLdContext['@context']['schema'] = 'https://schema.org/'
    }

    return jsonLdContext
}

Would this be something that could be considered for jsonld-context-parser? I can also understand if it is not appropriate as it is quite an ugly workaround.

rubensworks commented 2 days ago

I definitely understand the problems and the pain you're running into.

But since RDF considers IRIs to only be equal if they match character-by-character, http/https variants of the same IRI are strictly different. This can be solved through entailment regimes, e.g. by using owl:sameAs, but I'm not sure this library here is the right place to apply this workaround. Not everyone might want to default to the https variant. My view on this is that this tool is correctly following what the schema.org context is saying; use http-based IRI.

But given you have a workaround, this looks like a good solution that you can build into your application.

danielbeeke commented 2 days ago

Would it be enough to express the intent of the protocol in the requested protocol of the JSON-ld context URL?

I mean: Requesting https://schema.org/docs/jsonldcontext.jsonld should result in "schema": "https://schema.org/" Requesting http://schema.org/docs/jsonldcontext.jsonld should result in "schema": "http://schema.org/"

but I'm not sure this library here is the right place to apply this workaround.

True.. me too.. although there are other workarounds for schema.org in the constructor.

But given you have a workaround, this looks like a good solution that you can build into your application. True we can just use patch-package, however it might be nice to improve this situation for everyone.

rubensworks commented 1 day ago

True we can just use patch-package

It should also be possible to just implement your own document loader, and plug this into to this library. Then you don't need to patch anything.

however it might be nice to improve this situation for everyone.

The main issue here is that not everyone may see this as an improvement. Some people may actually depend on the current behaviour. As such, I don't want this library to force a workaround upon them. But if people do, they can opt-in to a custom workaround.