spekulatius / PHPScraper

A universal web-util for PHP.
https://phpscraper.de
GNU General Public License v3.0
515 stars 73 forks source link

[Proposal] Add HTTP proxy support #118

Closed nathabonfim59 closed 1 year ago

nathabonfim59 commented 1 year ago

I'm working on a project that needs to be constantly changing proxies and taking a look in the Goute client, the underlying implementation already supports it.

Research

  1. HttpBrowser, which Goute is based on, supports a custom client, implementing the HttpClientInterface (source code)

    ...
    class HttpBrowser extends AbstractBrowser
    {
    private $client;
    
    public function __construct(HttpClientInterface $client = null, History $history = null, CookieJar $cookieJar = null)
    ...
        $this->client = $client ?? HttpClient::create();
    ...
  2. The HttpClient::create() supports proxy using the $defaultOptions parameter, that gets passed to the selected HttpClient (source code)
        public static function create(array $defaultOptions = [], int $maxHostConnections = 6, int $maxPendingPushes = 50): HttpClientInterface

Implementation details

The idea is to expose this functionality though a setProxy function in the core class. The library will continue to dynamically select the httpClient accordingly.

I just made a POC in my fork with all necessary code to make this work (a6589da).

public function setProxy(string $proxy)
{
    $httpClient = HttpClient::create([
        'proxy' => $proxy
    ]);

    $this->client = new Client($httpClient);

    return $this;
}

How to use

$web = new phpscraper;
$web->__call('setProxy', [
    'http://user:password@127.0.0.1:3128',
]);

If this feature gets approved, I will open a PR with it. If anything needs to be changed, let me know.

spekulatius commented 1 year ago

Hello @nathabonfim59

Thank you for opening an issue on this. Definitely a feature worth to add :muscle:

I wonder why you called it like this:

$web->__call('setProxy', [
    'http://user:password@127.0.0.1:3128',
]);

instead of:

$web->setProxy('user:password@127.0.0.1:3128');

Is there any particular reason for it?

Other than this question it looks good, could you open the PR to discuss any details that might come up?

Cheers, Peter

nathabonfim59 commented 1 year ago

Actually, I misunderstood how this works (autocomplete issues) :sweat_smile:

You're right, it defaults to HTTP in the schema. I'll open the PR right away.

spekulatius commented 1 year ago

Thank you @nathabonfim59 :muscle:

spekulatius commented 1 year ago

I've tagged and push a new version 0.6.3. Please let me know if any issues come up :+1:

spekulatius commented 1 year ago

Hey @nathabonfim59

I've made the configuration steps more generic to support various configuration details. This means the call for setProxy('...') has been replaced with setConfig(['proxy' => '...']) with the latest version. Just a heads up :+1:

Cheers, Peter

nathabonfim59 commented 1 year ago

Took a look at the recent changes, it's a much more flexible approach. 👏

Thanks!