php-http / client-common

Common HTTP Client implementations and tools for HTTPlug
http://httplug.io
MIT License
1.03k stars 54 forks source link

Circular redirection detected #212

Closed fgtham closed 2 years ago

fgtham commented 3 years ago

PHP version: 7.4.16

Description

I'm trying to fetch an article from https://www.nature.com/articles/d41586-019-01307-2. The article ist behind several redirects:

$ curl --location -D- -v https://www.nature.com/articles/d41586-019-01307-2
> GET /articles/d41586-019-01307-2 HTTP/2
< HTTP/2 303 
> GET /authorize?response_type=cookie&client_id=grover&redirect_uri=https%3A%2F%2Fwww.nature.com%2Farticles%2Fd41586-019-01307-2 HTTP/2
< HTTP/2 302 
> GET /transit?redirect_uri=https%3A%2F%2Fwww.nature.com%2Farticles%2Fd41586-019-01307-2&code=4026f248-b493-47ae-9e9c-2b6cd7587d29 HTTP/2
< HTTP/2 302 
> GET /articles/d41586-019-01307-2?error=cookies_not_supported&code=4026f248-b493-47ae-9e9c-2b6cd7587d29 HTTP/2
< HTTP/2 200 

The original URI and final location URI only differ in the absence and presence of the query string, the path is identical. This triggers a "Circular redirection detected" exception in lines 171..174 of client-common/src/Plugin/RedirectPlugin.php:

            $this->circularDetection[$chainIdentifier][] = (string) $request->getUri();

            if (in_array((string) $redirectRequest->getUri(), $this->circularDetection[$chainIdentifier])) {
                throw new CircularRedirectionException('Circular redirection detected', $request, $response);
            }

As the query string is not taken into account, the exception is thrown while following the final redirect.

Additional context

I am not using client-common directly, rather I'm trying to store the article in wallabag, which appears to be using client-common.

dbu commented 3 years ago

the query string is indeed relevant. imagine some old style php cms that does index.php?page=abc...

i am a bit confused though: does $redirectRequest->getUri() not include the query string? according to https://www.php-fig.org/psr/psr-7/ (section 3.5) the UriInterface includes the query string. is the (string) casting of the Uri object not including the query parameters in your case?

fgtham commented 3 years ago

I modified the exception message to output the URI:

                throw new CircularRedirectionException('Circular redirection detected ' . (string) $redirectRequest->getUri(), $request, $response);

The log says:

[2021-07-26 10:53:54] graby.WARNING: Request throw exception (with a response): Circular redirection detected https://www.nature.com/articles/d41586-019-01307-2 {"error_message":"Circular redirection detected https://www.nature.com/articles/d41586-019-01307-2"} []

So, apparantly, the query string is not included in (string) $redirectRequest->getUri(), at least not in the php-http/client-common version included in wallabag. CHANGELOG.md says it's 2.3.0.

dbu commented 3 years ago

client-common does not provide a PSR 7 implementation. can you please check what PSR-7 implementation you are using? that typically would be guzzlehttp/psr7 or nyholm/psr7, but could be a bunch of others: https://packagist.org/providers/psr/http-message-implementation

fgtham commented 3 years ago

It appears to be guzzlehttp/psr7 v1.7.0, at least I can find the project in the vendor directory.

dbu commented 3 years ago

i just tried and i think there is some problem with your flow:

<?php

require('vendor/autoload.php');

use GuzzleHttp\Psr7\Uri;

$uri = new Uri('http://example.com/path?query');
echo (string) $uri;
echo "\n";
$ php test.php 
http://example.com/path?query

so we do get the query string. could it be that nature in the end also redirects back to a url without query string? i suggest you print each redirection to see what exactly is going on.

does the login work with cookies? we might need to look into the cookies header as that could be the difference. however, that would increase the risk of an endless loop a lot, if you somehow keep getting redirected and keep receiving new cookies (e.g. a new session)

fgtham commented 3 years ago

I don't understand. The problem seems to originate in $request->getUri() above. How does this relate to GuzzleHttp's Uri()?

I dumped the redirect locations in the report above, the final redirect has the query string error=cookies_not_supported&code=4026f248-b493-47ae-9e9c-2b6cd7587d29.

The article is available without login. I assume the redirect orgy checks whether the article is freely available or locked away. I can fetch the URI with curl without providing any cookies.

dbu commented 3 years ago

request->getUri returns a PSR-7 uri, thats why i checked that.

we seem to not have a test for the redirect plugin. can you try to reproduce the problem with a unit test in https://github.com/php-http/client-common/tree/master/tests/Plugin ? then we can be more sure if it really is not working or if somehow your case is something special?

fgtham commented 3 years ago

I'm sorry, but this is beyond my php-fu :(

dbu commented 3 years ago

i think that there must be some more redirect going on that makes the plugin see a loop, as the query string is part of the request. it probably is legitimate if you can get to the page in the web browser and not end up in an endless redirect. but i would need a failing test to understand what is happening, i can't see from your issue description how that would fail.

dbu commented 2 years ago

this should be fixed in https://github.com/php-http/client-common/releases/tag/2.5.1 now.

can you confirm and close the issue if its ok, please?