prebid / prebid-server

Open-source solution for running real-time advertising auctions in the cloud.
https://prebid.org/product-suite/prebid-server/
Apache License 2.0
425 stars 720 forks source link

Requests for stored requests http endpoint don't follow RFC 3986 #3595

Open MatrixDev opened 5 months ago

MatrixDev commented 5 months ago

From the Go server documentation (stored_requests/backends/http_fetcher/fetcher.go):

// Stored requests
// GET {endpoint}?request-ids=["req1","req2"]&imp-ids=["imp1","imp2","imp3"]

Below is an example of the actual request from the PBS to endpoint:

http://127.0.0.1/?request-ids=["0689a263-318d-448b-a3d4-b02e8a709d9d"]&imp-ids=["prebid-demo-banner-320-50"]

But according to RFC 3986 this is not a valid URI format because query cannot contain [, ] nor " characters:

    query       = *( pchar / "/" / "?" )
    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
    pct-encoded   = "%" HEXDIG HEXDIG
    sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

This causes problems when implementing stored requests endpoints with strict frameworks like Actix / Axum (probably others as well).

As a quick hack this fixes the problem but most probably it should be implemented by properly encoding query arguments:

func buildRequest(endpoint string, requestIDs []string, impIDs []string) (*http.Request, error) {
    if len(requestIDs) > 0 && len(impIDs) > 0 {
        return http.NewRequest("GET", endpoint+"request-ids=%5B%22"+strings.Join(requestIDs, "%22,%22")+"%22%5D&imp-ids=%5B%22"+strings.Join(impIDs, "%22,%22")+"%22%5D", nil)
//      return http.NewRequest("GET", endpoint+"request-ids=[\""+strings.Join(requestIDs, "\",\"")+"\"]&imp-ids=[\""+strings.Join(impIDs, "\",\"")+"\"]", nil)
    } else if len(requestIDs) > 0 {
        return http.NewRequest("GET", endpoint+"request-ids=%5B%22"+strings.Join(requestIDs, "%22,%22")+"%22%5D", nil)
//      return http.NewRequest("GET", endpoint+"request-ids=[\""+strings.Join(requestIDs, "\",\"")+"\"]", nil)
    } else {
        return http.NewRequest("GET", endpoint+"imp-ids=%5B%22"+strings.Join(impIDs, "%22,%22")+"%22%5D", nil)
        // return http.NewRequest("GET", endpoint+"imp-ids=[\""+strings.Join(impIDs, "\",\"")+"\"]", nil)
    }
}
bretg commented 5 months ago

@Net-burst to look at whether this is an issue for PBS-Java as well

Net-burst commented 5 months ago

This issue somewhat affects PBS-Java as of Java 20. PBS will just fail due to URL being malformed and won't be sending anything using non-compliant URL. @SerhiiNahornyi @And1sS FYI

bsardo commented 1 month ago

You're able to specify the same query param multiple times in the query string. When pulling this info from the query string you should get an array of all of the values for that param. In this case we would make the query param name singular.

https://pkg.go.dev/net/url#ParseQuery

We should support the old format as well.

stashkevi4 commented 1 month ago

@bsardo I'd like to clarify few things before start implementation:

  1. Default URL builder should be the same as in Java version, so requests will be built like this:
    // GET {endpoint}?request-id=req1&request-id=req2&imp-id=imp1&imp-id=imp2&imp-id=imp3

    With corresponding changes in all comments and docs.

  2. In order to support legacy (current) version new option should be added to the config:
    // HTTPFetcherConfig configures a stored_requests/backends/http_fetcher/fetcher.go
    type HTTPFetcherConfig struct {
    Endpoint    string `mapstructure:"endpoint"`
    AmpEndpoint string `mapstructure:"amp_endpoint"`
        UseLegacyRequestBuilder `mapstructure:"use_legacy_request_builder"`  // ???
    }

    Also, legacy request builder will have correctly encoded symbols using url.QueryEscape

  3. FetchAccounts function has the same problem with incorrectly encoded symbols. I assume it should follow the same logic.
  4. There's a potential bug in FetchCategories function. Endpoint could be with or without params embedded:

    urlPrefix := endpoint
    if strings.Contains(endpoint, "?") {
        urlPrefix = urlPrefix + "&"
    } else {
        urlPrefix = urlPrefix + "?"
    }
    
    return &HttpFetcher{
        client:   client,
        Endpoint: urlPrefix,
    }

In FetchCategories function we are assuming that there could be only Endpoints without params:

        var dataName, url string
    if publisherId != "" {
        dataName = fmt.Sprintf("%s_%s", primaryAdServer, publisherId)
        url = fmt.Sprintf("%s/%s/%s.json", strings.TrimSuffix(fetcher.Endpoint, "?"), primaryAdServer, publisherId)
    } else {
        dataName = primaryAdServer
        url = fmt.Sprintf("%s/%s.json", strings.TrimSuffix(fetcher.Endpoint, "?"), primaryAdServer)
    }

E.g. if Endpoint looks like this:

http://requests.endpoint.com/requests?foo=bar&

it will result in something like this:

http://requests.endpoint.com/requests?foo=bar&/primaryAdServer/pubId.json

Ideally, we should use the domain of the Endpoint here. Or at least replace strings.TrimSuffix with strings.Split(fetcher.Endpoint, "?")[0].

WDYT?