Clarifications on processing expected_origins and processing steps

marcoscaceres commented 3 months ago

The spec says:

expected_origins: REQUIRED when signed requests defined in Appendix A.3.2 are used with the W3C Digital Credentials API [w3c.digital_credentials_api]. An array of strings, each string representing an origin of the Verifier that is making the request. The Wallet can detect replay of the request from a malicious Verifier by comparing values in this parameter to the origin asserted by the Web Platform.

However, the spec seems to be missing details of how expected_origins is to be processed.

What should happen when:

expected_origins is empty
- Issue: It's unclear whether the implementation should accept any origin or reject all requests when expected_origins is empty.
- Example Case: If expected_origins is an empty array, should the implementation treat this as allowing any origin or blocking all origins?
expected_origins contains an invalid origin or URLs
- Issue: Should the implementation reject the entire list or simply ignore invalid entries if it contains invalid origins?
- Example Case: If expected_origins contains entries like ["1://2", "https://foo.com"], should the presence of 1://2 cause the whole list to be rejected, or should it just ignore 1://2 and drop it on the floor?
expected_origins contains a non-https origin
- Issue: It is unclear whether non-https origins should be allowed.
- Example Case: If expected_origins includes ["gopher://foo:123"], should this entry be accepted or rejected?
expected_origins contains a relative or absolute path, or an empty string
- Issue: Should paths be resolved against a base URL or be considered invalid? How should empty strings be handled?
- Example Case: If expected_origins includes relative or absolute paths like [".", "/"] or an empty string like [""], should these be resolved against a base URL or rejected?
expected_origins contains URLs with paths, query, and/or fragments
- Issue: It is unclear whether origins with paths, query parameters, or fragments should be normalized (e.g., removing paths, queries, and fragments) or considered invalid.
- Example Case: If expected_origins contains URLs such as ["https://foo.com/path?query=1#fragment"], should these be normalized to just the origin or be rejected?
expected_origins contains duplicates
- Issue: Should duplicates be removed or cause the entire list to be rejected?
- Example Case: If expected_origins includes ["https://foo.com", "https://foo.com"], should the duplicates be removed or should this result in an error?
expected_origins contains non-string entries
- Issue: Should all entries in expected_origins be converted to strings? By implication, expected_origins should be defined as a WebIDL USVString to ensure proper handling of Unicode and type conversions.
- Example Case: If expected_origins includes entries like ["https://foo.com", null, "https://bar.com"], should non-string entries be converted to strings or cause an error?
Explanation: Defining expected_origins as a WebIDL USVString ensures that all entries are properly handled as strings, accommodating Unicode characters correctly. This is important because URLs can contain Unicode characters, and using USVString helps to normalize these characters to their Unicode Scalar Value (USV) form, ensuring consistency and preventing issues related to encoding and type mismatches.
Handling of Localhost and Intranet URLs
- Issue: Should URLs pointing to localhost or intranet addresses be treated with special consideration, especially regarding security implications?
- Example Case: If expected_origins contains ["https://localhost"] or ["https://192.168.1.1"], should these be allowed given they are local resources?
Normalization of URLs
- Issue: During normalization, the URLs should be stripped of their query parameters, fragments, and paths. The primary concern is ensuring that only the origin part (scheme, host, port) is used.
- Example Case: If expected_origins includes ["https://foo.com/path?query=1#fragment"], should it be normalized to ["https://foo.com"] before comparison?
Recommendation: Each URL should be parsed using the WHATWG URL parser, and only the origin should be added to a set to avoid duplicates. This ensures that the origins are consistently and correctly normalized, enhancing security and preventing ambiguity.

To ensure proper handling of expected_origins, we recommend defining it as follows in WebIDL:

dictionary OpenID4VPRequest {
  sequence<USVString> expected_origins;
};

Further, the spec is missing validation step. Here's a start:

Validator and Processor for `OpenID4VPRequest`

This section defines an algorithm to validate and process the expected_origins array in an OpenID4VPRequest dictionary. The output will be a set of normalized origins, safely discarding query, fragments, paths, etc.

Algorithm

Input:
- An OpenID4VPRequest dictionary |request|, containing:
  - expected_origins: A sequence of USVString.
Output:
- A set of normalized origins.
Steps:
1. Check if expected_origins is empty:
  1. If expected_origins is empty, throw a TypeError indicating that expected_origins must not be empty.
2. Let origins_set be an empty set.
3. For each origin in expected_origins:
  1. Parse the URL:
    1. Let url be the result of parsing origin using the URL parser with null as the URL base.
    2. If parsing origin fails, throw a TypeError indicating an invalid URL.
  2. Check scheme:
    1. If url's scheme is not "https", throw a TypeError indicating an unsupported URL scheme.
    2. ISSUE: Should localhost and intranet addresses be allowed? (e.g., https://localhost or https://192.168.1.1). Probably not.
  3. Extract origin:
    1. Let normalized_origin be the result of running the origin serialization steps on url.
  4. Check for duplicates:
    1. If origins_set contains normalized_origin, continue.
  5. Add to set:
    1. Add normalized_origin to origins_set.
4. Check if origins_set is empty:
  1. If origins_set is empty, throw a TypeError indicating that no valid origins were found.
5. Return the set:
  1. Return origins_set.

marcoscaceres commented 3 months ago

Noting that for JS/Web compatibility, the following are all technically legal and would be handled by the WebIDL conversion (which automatically converts everything to a USVString):

const expected_origins = [ 
   { toString() { return "https://foo.com"} }, 
   new URL("https://foo.com"), 
   "https://foo.com"
];
await navigator.identity.get({digital: {
  providers: [{expected_origins, /*... other props */}]}
});

danielfett commented 3 months ago

Not sure if we need to define the actual algorithm, but the text should be more normative and enforce those things listed by @marcoscaceres above, e.g.:

(...) MUST be a non-empty array of strings, each string representing an origin of the Verifier that is making the request. If the list is empty, contains elements that are not strings or do not represent origins or contain path or query, or fragment elements, or contains duplicated, the Wallet MUST reject the request.

RByers commented 3 months ago

If the DC API spec is going to require request validation then perhaps the OpenID4VP spec needs to be clear about which validation rules are appropriate to apply at the transport (browser/OS) layer, vs. which are strictly for a wallet to enforce? In particular, since browsers and OSes may take some time to update, we need to make sure they're not accidentally blocking the deployment of extensions or other improvements to OpenID4VP which wallets and RPs are looking to support.

marcoscaceres commented 3 months ago

On the browser side, we prefer algorithms because they are easier to read, test, and code against. English is a bit of crappy language for specs unfortunately, so short little statements works best for precision.

Would folks be interested if we put together a pull request for this as a start?

I'm new here, so not exactly how this works 🐣... @Sakurann or @tlodderstedt or @tplooker could you guide me? 🙈🙏

marcoscaceres commented 3 months ago

@danielfett, just so you can see what I mean, this is from the rough validator I'm working on for the DC API in WebKit to "process expected origins" (it's totally half-baked 🧑‍🍳 as I don't have a spec 🫠... but hopefully you get the idea of where I'm going with it):

String OpenID4VPRequest::processExpectedOrigins(Document& document)
{
    Vector<String> normalizedOrigins;

    for (auto& origin : expected_origins) {
        if (origin.isEmpty())
            return "Expected origin must not be empty."_s;

        auto url = SecurityOrigin::create(document.completeURL(origin));

        if (!url.isValid())
            return "Expected origin is not a valid URL."_s;

        if (!url.protocolIs("https"))
            return "Expected origin must be an HTTPS URL."_s;

        normalizedOrigins.append(url->toString());
    }
    return {};
}

For folks interested, here's the validator I'm working on that runs when the DC API is called on the browser. It runs immediately after javascript calls .get():

https://github.com/WebKit/WebKit/pull/32577/files#diff-685bce3af29979163981d54437c25b4e3965566cdb02010827019755e1d41d35

What's super cool is that each structure there is self validating, applying whatever rules we add to the spec ✨.

openid / OpenID4VP