Import vulnerabilities - Githubissues

gkellogg commented 3 years ago

The import feature creates vulnerabilities similar to the JSON-LD remote context loading. In the case of JSON-LD, the document loader provides a means of avoiding accessing remote resources, although it's still come under a fair amount of criticism (See https://github.com/w3c/json-ld-syntax/issues/108 and https://github.com/w3c/json-ld-api/issues/14 for example).

A man-in-the middle attack could cause different systems to receive different documents at different times.
Accessing the remote resource presents an opportunity to track usage and leak intention.
Routinely accessing remote resources can place a burden on the host (e.g., schema.org)
There is no facility for embedded use to avoid the remote lookup.
A malicious service can cause a stack-overflow by automatically creating nested documents.

The spec should address this concern and/or provide mitigations. One area that JSON-LD may pursue in the future is the use of integrity checks (ala https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity).

ericprud commented 3 years ago

The import feature creates vulnerabilities similar to the JSON-LD remote context loading. In the case of JSON-LD, the document loader provides a means of avoiding accessing remote resources, although it's still come under a fair amount of criticism (See w3c/json-ld-syntax#108 and w3c/json-ld-api#14 for example).

I susepct that ShEx's issues are almost identical JSON-LD's issues (or owl:imports, or XInclude). Can we just crib your solution when you get there?

A man-in-the middle attack could cause different systems to receive different documents at different times.

Accessing the remote resource presents an opportunity to track usage and leak intention.

Routinely accessing remote resources can place a burden on the host (e.g., schema.org)

I think these three vulnerabilities are the same as for dereferencing the initial document (be it a schema, JSON-LD doc, OWL ontology). Web Arch caveat emptor?

There is no facility for embedded use to avoid the remote lookup.

I think embedding could help efficient caching of mutable resources but you'd never want the embedded form to trump the dereferenced. If you want the embedded form to win, you don't need the IMPORT (or @context dereference).

A malicious service can cause a stack-overflow by automatically creating nested documents.

Yeah, by having a language where IMPORTs can have IMPORTs, we have an exploitable recursion.

The spec should address this concern and/or provide mitigations. One area that JSON-LD may pursue in the future is the use of integrity checks (ala https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity).

Fascinating. We could implement that like:

IMPORT <http://important.example/hacked/schema> INTEGRITY "md5-1234"

ericprud commented 2 years ago

The import feature creates vulnerabilities similar to the JSON-LD remote context loading. In the case of JSON-LD, the document loader provides a means of avoiding accessing remote resources, although it's still come under a fair amount of criticism (See w3c/json-ld-syntax#108 and w3c/json-ld-api#14 for example).

I susepct that ShEx's issues are almost identical JSON-LD's issues (or owl:imports, or XInclude). Can we just crib your solution when you get there?

poking @gkellogg to see what we can steal from JSON-LD 1.1

A malicious service can cause a stack-overflow by automatically creating nested documents.

Yeah, by having a language where IMPORTs can have IMPORTs, we have an exploitable recursion.

Come to think if it, that's pretty much the same as generating an infinite schema, modulo more per-request cost in non-pipelined HTTP connections.

I think the biggest vulnerability would have been to clients with a less-than-graceful handling of circular imports, but that's expicitly tested in 2RefS1-Icirc, where 2RefS1-Icirc circularly imports 2RefS1-Icirc and 2RefS2-Icirc.

The spec should address this concern and/or provide mitigations. One area that JSON-LD may pursue in the future is the use of integrity checks (ala https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity).

Fascinating. We could implement that like:
IMPORT <http://important.example/hacked/schema> INTEGRITY "md5-1234"

Did JSON-LD ever provide any guidance like that?

shexSpec / spec

Import vulnerabilities #43