spekulatius / PHPScraper

A universal web-util for PHP.
https://phpscraper.de
GNU General Public License v3.0
509 stars 73 forks source link

Idea: Directly exposing received headers #172

Open spekulatius opened 1 year ago

spekulatius commented 1 year ago
Thanks, so it actually depends on the header. So far headers haven't been processed much. Exposing them would be beneficial in general. What do you think?

_Originally posted by @spekulatius in https://github.com/spekulatius/PHPScraper/pull/164#discussion_r1045676088_

spekulatius commented 1 year ago

Hey @eposjk,

To separate the topic a bit better I'd suggest to continue the conversion on headers here.

Yes, they could be helpful.

But how should be expose them? When we look at https://github.com/symfony/browser-kit/blob/6.2/Response.php, we see that $web->client->getResponse()->getHeader('Some-Header') normalizes the header name (case and -_) - and $web->client->getResponse()->getHeaders() return all unnormalized headers. And $web->client->getResponse()->getHeader('Some-Header', false) returns an array of maybe multiple headers of the same kind.

Yeah, that a good point on the converting. It appears as if the processing in $web->client->getResponse()->getHeader('Some-Header')is quite useful: normalizing and returning a result depending on the type.

To match with the current naming, I would make getHeaders() -> headersRaw() (for those wanting to access every detail). The definition and usage of getHeader()makes sense too. Why not expose it directly as it is?

I think it also makes sense to have some basic test to ensure the behavior of the underlying library doesn't change. The three cases as above on some example page should do.

What are the use cases:

* check one header (e.g. Date, Last-Modified, Expires, Content-Language)

* store all headers to use them later (normalized with `$normalizedHeader = ucwords(strtolower($header), '-_')` ?)

Do we need to support multiple headers of the same type? It seems that there are two notations to set multiple values for a header: sending multiple headers with the same name and folding them separated by ", ". Folding seems to be allowed for all except a deprecated form of the Set-Cookie header which uses the Expires=... parameter (instead of the newer Max-Age parameter). What about folding all parameter in our getHeaders() function and transforming Set-Cookie+Expires to Set-Cookie+Max-Age and normalizing them as described above?

Multiple headers of the same type could come up. Normalizing data is fine, as we expose the raw data in case people want to tweak stuff. I haven't found anything about the deprecated header on the page. The conversion from Set-Cookie+Expires to Set-Cookie+Max-Age should be fine, as long as it's documented proper.