Add source headers to hosting API calls

fershad commented 11 months ago

Is your feature request related to a problem? Please describe. Not a problem, but something that help us (Green Web Foundation) later down the line to understand where API requests are coming from.

Describe the solution you'd like Add a custom header x-greencheck-src to the API fetch requests made to the Greencheck API.

https://github.com/thegreenwebfoundation/co2.js/blob/51b85e23dc1f52ac6642259c09e56fef62948087/src/hosting-api.js#L23-L25

https://github.com/thegreenwebfoundation/co2.js/blob/51b85e23dc1f52ac6642259c09e56fef62948087/src/hosting-api.js#L41

There should be a default value x-greencheck-src: "co2js" but there should be a mechanism for that to be changed to a value that's set by the user.

mrchrisadams commented 11 months ago

Fish, given the traffic we get I'm very much in favour of this, but would you mind adding a bit of info outlining the where the x-greencheck-src: "co2js syntax came from? I get the x- prefix for unnofficial headers, but the rest I'm less confident commenting because I'm not so familiar with comparable prior work.

I think there may be existing conventions we can refer to for identifying API clients or user agents, that have been implemented in various tooling for parsing logs, rate-limiting and so on.

This would save us work further down the line if we follow an existing convention or spec for handling API traffic.

fershad commented 11 months ago

Maybe @philsturgeon might have some idea.

Phil, we're hoping to introduce a way to see how many checks against the Green Web Dataset API are coming from different tools/providers.

I've suggested the idea of a custom x- header here as a way for folks to self-report when sending a request. Outside of an API key solution, is there any other convention for how might be able to do this?

philsturgeon commented 11 months ago

From what I'm understanding of the requirements you can use the User Agent header for this. That's exactly what its for!

mrchrisadams commented 11 months ago

Yeah, thanks @philsturgeon - I agree that the closest thing is likely the User Agent Header - it's in the HTTP spec as a SHOULD:

14.43 User-Agent

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests. The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application.
   User-Agent     = "User-Agent" ":" 1*( product | comment )
Example:
   User-Agent: CERN-LineMode/2.15 libwww/2.17b3

It's true that it's mainly used for browsers, but we know scraper/crawler bots use it too.

MDN also share some examples of API clients, or other binaries like curl, or PostManRuntime sending it in the docs below:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

This leaves open the question of what the Use Agent String ought to be by default. I'm less sure here.

My starting point might be the library name and version - we could go with something like co2.js/0.13.1 - for example.

This would give us an idea of what API support might be across the clients hitting our API if we introduced new features over time, and it only uses characters we know are already used in other UA strings.

Because we don't restrict access or require API keys right now, setting the UA explicitly would presumably at least give us some useful basis for to understand more about the sources of traffic.

Update: It might be the case that the whatever runtime we use already sends something we can use already. I just checked the logs, and we definitely get it for most of the traffic - or rather, requests sent mostly use the same UA strings as browsers, so we can't easily differentiate API traffic from browser traffic right now.

More below: https://blog.postman.com/what-are-http-headers https://en.wikipedia.org/wiki/User-Agent_header

This question on stack exchange was useful context, too: https://softwareengineering.stackexchange.com/questions/355670/does-it-make-sense-for-user-agent-to-be-required-for-rest-apis

philsturgeon commented 11 months ago

Anyone and everyone can set the user agent header, most just don’t bother. I generally get iOS clients to set it so I can see which versions people are using without invasive telemetry, things like that, so this seems like basically the same thing.

sfishel18 commented 10 months ago

👋 hello! i came across this project in the Frontend Focus newsletter, and if you're looking for new contributors i'd love to help out! this particular issue looks like a good introductory one. could i take a crack at it?

fershad commented 10 months ago

@sfishel18 thanks for reaching out. We'd love a PR for this. Based on the conversation between Chris and Phil above, here's a small spec:

Add a User-Agent header to the fetch requests in the original comment.
The header should have the value co2js/<version>, where <version> is the version number of the library being used to make the request.
Ideally, the version number should update by itself, without us having to manually change it every release.

Let me know if you get stuck or need a hand.

thegreenwebfoundation / co2.js

Add source headers to hosting API calls #181