prebid / prebid-server

Open-source solution for running real-time advertising auctions in the cloud.
https://prebid.org/product-suite/prebid-server/
Apache License 2.0
436 stars 744 forks source link

Parsing oRTB.site.domain when no http referer is provided #2300

Closed vfedoseev closed 1 year ago

vfedoseev commented 2 years ago

Hi,

While checking the Prebid server integrations on several sites, we've noticed an issue with 'no-referrer' meta tag.

The /auction endpoint is currently checking the following values in bid request:

If any of them is not presented, then the Http.Referer header is parsed and the results are passed to the 'Site' object: https://github.com/prebid/prebid-server/blob/master/endpoints/openrtb2/auction.go#L1525

Sometimes the publishers use the following meta tag on the sites - it prevents browsers from sending Http.Referer to any requests: <meta name="referrer" content="no-referrer">

If the 'site.domain' is also missing in the bid request payload, then it won't be parsed and passed to the adapters. Example request below:

{
    "id": "ddd21d93-6c5d-458a-9cc2-0452330702e6",
    "source": {
        "tid": "ddd21d93-6c5d-458a-9cc2-0452330702e6"
    },
    "tmax": 3000,
    "imp": [
        ....
    ],
    "ext": {
        "prebid": {
            "auctiontimestamp": 1657713316889,
            "targeting": {
                "includewinners": true,
                "includebidderkeys": false
            },
            "channel": {
                "name": "pbjs",
                "version": "v5.20.4"
            }
        }
    },
    "cur": [
        "EUR"
    ],
    "site": {
        "publisher": {
            "id": "1"
        },
        "page": "https://test.somepage.com"
    }
}

So, the final request to /auction enpoint:

We suggest to parse the Site.Page - if Http.Referer is empty and Site.Page is available - to extract the Site.Domain value.

bretg commented 2 years ago

PBS-Java sets site.domain and site.publisher.domain:

site.domain: full domain. e.g. www.example.com. or sports.usatoday.com site.publisher.domain: 'rounded off' site.domain. e.g. example.com or usatoday.com

We'll review how these values are set and make sure it's in sync with Go

bsardo commented 2 years ago

PBS-Go current logic:

if site.page is not set {
    if http.referer is set and http.referer is a valid URL {
        site.page = http.referer
    }
}
if site.domain is not set {
    if http.referer is set and http.referer is a valid URL {
        site.domain = http.referer.host   (i.e. the example.com portion of http://cool.example.com)
    }
}

Note: we never set site.publisher.domain. If it is specified in the original request though it will be passed through.

PBS-Go logic with proposed change:

if site.page is not set {
    if http.referer is set and http.referer is a valid URL {
        site.page = http.referer
    }
}
If site.domain is not set {
    if http.referer is set and http.referer is a valid URL {
        site.domain = http.referer.host   (i.e. the example.com portion of http://cool.example.com)
    } else if http.referer is not set and site.page is set and site.page is a valid URL {
        site.domain = site.page.host   (i.e. the example.com portion of http://cool.example.com)
    }
}

Note: for http.referer or site.page to be considered valid when trying to set domain, they must either be an absolute ([scheme]://) or relative (//) path.

Also, site.page is required to be set while site.domain is not.

bretg commented 2 years ago

I've opened an internal ticket to have the PBS-Java team investigate and align Java to this approach.

bretg commented 2 years ago

FWIW, happened to run a test comparing PBS-Go and PBS-Java output. Here's the outcome regarding domains.

Referer: https://www.britannica.com/event/Seven-Years-War/Preliminary-negotiations-and-hostilities-in-the-colonies Incoming site object:

{
    "site": {
        "publisher": {
            "id": "9262"
        },
        "page": "https://www.britannica.com/event/Seven-Years-War/Preliminary-negotiations-and-hostilities-in-the-colonies"
    }
}

I believe the PBS-Java results are correct:

    "site": {
        "domain": "www.britannica.com",
        "page": "https://www.britannica.com/event/Seven-Years-War/Preliminary-negotiations-and-hostilities-in-the-colonies",
        "publisher": {
            "id": "9262",
            "domain": "britannica.com"
        },
        "ext": {
            "amp": 0
        }
    },

PBS-Go hasn't been updated:

    "site": {
        "domain": "britannica.com",
        "page": "https://www.britannica.com/event/Seven-Years-War/Preliminary-negotiations-and-hostilities-in-the-colonies",
        "publisher": {
            "id": "9262"
        },
        "ext": {
            "amp": 0
        }
    },
bsardo commented 2 years ago

👍 The original problem description has been implemented but PBS-Go still needs to set site.publisher.domain if it's not set.

mansinahar commented 2 years ago

@AlexBVolcy Thanks for working on this. I noticed that PBS Go currently sets Site.Domain as the highest level domain i.e. if the host is foo.bar.baz.com, it will set it as baz.com. However, based on the openrtb2 documentation, Site.Domain is supposed to be the full domain of the website i.e. for the above example, foo.bar.baz.com and Site.Publisher.Domain is supposed to be the highest level domain i.e for the above example, just baz.com. This is how PBS Java works as well and you can see that from the example @bretg posted above.