w3c / webappsec-subresource-integrity

WebAppSec Subresource Integrity
https://w3c.github.io/webappsec-subresource-integrity/
Other
70 stars 35 forks source link

The algorithm for parsing metadata should be described in more detail #84

Open baek9 opened 4 years ago

baek9 commented 4 years ago

3.4.3. Parse metadata describes an algorithm for integrity metadata parsing. We need to describe it in detail for every possible situation. This will unify the behavior of browsers implementing SRI. Chrome and Firefox, which currently implement SRI, have different behaviors when parsing integrity metadata.

For example, after parsing metadata that contains non-base 64 digest, Firefox returns it by the step 4 of the algorithm since the digest is not empty. Chrome, on the other hand, returns "no metadata" since digest is not valid. See more details in here.

This is just one example. So, this issue suggests that 3.4.3 Parse metadata should be described in more detail in order to prevent unknown cases.

annevk commented 4 years ago

Yeah, the validity/parsing step should be replaced by an algorithm that parses the string using primitives from https://infra.spec.whatwg.org/#strings. (Whenever we use ABNF this is the result. So we should stop doing that for normative requirements.)

It's also weird how that ABNF references CSP2, but only CSP1 is in the normative references. Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

baek9 commented 4 years ago

As @annevk say CSP2 is not in the normative reference even if it is referenced by 3.4.6 The integrity attribute. And @annevk also said :

Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

In fact, there is no need to decode the base64 encoded digest contained in the metadata. In the metadata parsing algorithm corresponding to 3.4.3 Parse metadata, It seems to mean checking only that the digest is in base64 according to the following ABNF grammar from CSP L2, section 4.2.

base64-value = 1*( ALPHA / DIGIT / "+" / "/" )*2( "=" ) <from [CSP L2, section 4.2]>
hash-expression = hash-algo "-" base64-value

In the integrity checking algorithm corresponding to 3.4.5 Does response match metadataList? and 3.4.1 Apply algorithm to response, We should return false when the digest and the base64 encoded value of the content's hash do not match. Also, 3.4.1 references base64 encoding algorithm from the RFC document, Section 4 of RFC 4648. And this is in the normative reference.

annevk commented 4 years ago

Ah okay, so you encode the hash using base64 and then compare it with the base64 value given here. We should have tests for the edge cases in that case, e.g., incorrect number of equal signs at the end.

domfarolino commented 4 years ago

So it seems like we should just define hash-expression as hash-algo "-" value, instead of hash-algo "-" base64value. That way, the parsing of non-base64 developer-supplied metadata will still populate the metadata set, and will inevitably fail to match the actual base64-encoded hashes. Does that sound right?

baek9 commented 4 years ago

That approach seems to eliminate the confusion that can occur in implementing SRI without modifying the algorithm. It will also fix the bug that @domfarolino originally raised. However, the spec defines digest as "the base64-encoded result of executing a cryptographic hash function on an arbitrary block of data". Does the definition need to be changed? If so, the original author's opinion is also needed. Thank you.

domfarolino commented 4 years ago

That makes sense, someone else's opinion would indeed be good to get /cc @mikewest.

yutakahirano commented 4 years ago

@domfarolino's idea sounds good (though I'm not an expert either).

We may want to have a step which allows us to reject the request before making a network fetch when the value is not a valid base64-encoded string.

annevk commented 4 years ago

I think the current setup should be replaced by a proper processing model along the lines of my earlier comment. Tweaking the ABNF doesn't seem sufficient.

domfarolino commented 3 years ago

@baek9 Do you have any intentions on getting back to this PR, by any chance?

baek9 commented 2 years ago

@domfarolino As it is now, I am not familiar with the w3c spec, so I thought someone better could have addressed this issue. I'm submitting a new commit(I'm not sure if the commit are correct) for this issue that hasn't been done yet. To go further, I'd like to know if that commit is sufficient to address the feedback raised. The feedback I have in mind is:

  1. No use ABNF grammar by @annevk Previously, metadata had to be parsed according to the ABNF grammar. However, the base64-value doesn't actually need to be in base64 format, as the name suggests. I think this is the root cause of this issue.

  2. Fail-open for forward-compatibility by @mikewest User agent will load a script with integrity="gibberish" by 3.8 of the new algorithm, for instance, which seems reasonable behavior to keep, since it gives us grammatical flexibility in the future.

  3. option-expression The specification says that option are also included in the metadata.

baek9 commented 2 years ago

Yeah, the validity/parsing step should be replaced by an algorithm that parses the string using primitives from https://infra.spec.whatwg.org/#strings. (Whenever we use ABNF this is the result. So we should stop doing that for normative requirements.)

It's also weird how that ABNF references CSP2, but only CSP1 is in the normative references. Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

@annevk [CSP1] appears to have been removed from 68fd65e. Also, it seems that [CSP L3 2.3.1 Source Lists] can be used instead of [CSP L2 4.2 Source List Syntax]. Do we need to add [CSP L3 2.3.1 Source Lists] as a normative reference for 3.1 Integrity metadata and 3.5 The integrity attribute?

annevk commented 2 years ago

Yeah, referencing https://w3c.github.io/webappsec-csp/#grammardef-hash-source directly makes sense to me for both those sections. We can probably also simplify some of the prose to no longer talk about options. No need for that until it's actually a thing.