serp-spider / search-engine-google

:spider: Google client for SERPS
https://serp-spider.github.io
Other
165 stars 61 forks source link

Transfer-Encoding: chunked with proxy enabled #112

Open migliori opened 5 years ago

migliori commented 5 years ago

Hi,

I've been using search-engine-google for a while, worked perfectly until now but I've got a recent issue with proxies.

The Google scraper works fine on my localhost, but on the production server it throws an error 500: Unable to check javascript status

The scraped results come with dom => textContent starting with "ncoding Transfer-Encoding: chunked"

I put a simple test online here: https://www.hack-hunt.com/scraping-simple-test.php The code is the code of your example here: http://serp-spider.github.io/documentation/search-engine/google/#installation

I just added: $proxy = Proxy::createFromString('https://xxx:proxy@ip'); $browser->setProxy($proxy);

It works fine on localhost, or on production server if I remove the proxy, but it fails on production with proxy.

Not sure if the issue comes from my server or search-engine-google.

Any help much appreciated, thanks

LunarDevelopment commented 5 years ago

Can you post the contents of your local composer.json and production composer.json & composer.lock ?

On Fri, 5 Oct 2018 at 09:28 Gilles Migliori notifications@github.com wrote:

Hi,

I've been using search-engine-google for a while, worked perfectly until now but I've got a recent issue with proxies.

The Google scraper works fine on my localhost, but on the production server it throws an error 500: Unable to check javascript status

The scraped results come with dom => textContent starting with "ncoding Transfer-Encoding: chunked"

I put a simple test online here: https://www.hack-hunt.com/scraping-simple-test.php The code is the code of your example here: http://serp-spider.github.io/documentation/search-engine/google/#installation

I just added: $proxy = Proxy::createFromString('https://xxx:proxy@ip'); $browser->setProxy($proxy);

It works fine on localhost, or on production server if I remove the proxy, but it fails on production with proxy.

Not sure if the issue comes from my server or search-engine-google.

Any help much appreciated, thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/serp-spider/search-engine-google/issues/112, or mute the thread https://github.com/notifications/unsubscribe-auth/AKNqM3yup7uJ31PvQlGsf7i29tHAmHx7ks5uhxgogaJpZM4XJx4q .

migliori commented 5 years ago

composer.zip

same on local & server

LunarDevelopment commented 5 years ago

Hm, looks alright - can you check that your production server IP is whitelisted / isn't blocked with the proxy service ?

On Fri, 5 Oct 2018 at 09:39 Gilles Migliori notifications@github.com wrote:

composer.zip https://github.com/serp-spider/search-engine-google/files/2449573/composer.zip

same on local & server

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/serp-spider/search-engine-google/issues/112#issuecomment-427289573, or mute the thread https://github.com/notifications/unsubscribe-auth/AKNqMyZ_H7fZB4KOi7Nn2t3aBW7ZY7kqks5uhxrPgaJpZM4XJx4q .

gsouf commented 5 years ago

@migliori Please check your CURL version. If curl version is not the same on the server, please try to upgrade and let us know what's going on.

migliori commented 5 years ago

No, it isn't, if you open https://www.hack-hunt.com/scraping-simple-test.php you'll see the string added before Google content: `public 'textContent' => string 'ncoding Transfer-Encoding: chunked

simpsons - Recherche Google(function(){window.google=...`

I suspected that headers could be added by Apache pagespeed module, but tried to disable it without success.

I can't change my PHP Curl version, it's built-in with PLESK PHP. version =>7.26.0 ssl_version => OpenSSL/1.0.1t libz_version => 1.2.7

I just tested with nginx instead of apache: same result.

gsouf commented 5 years ago

@migliori not php-curl, just curl itself. Run curl --version

migliori commented 5 years ago

I already did it: apt-get update && apt-get install curl libcurl curl --version curl 7.26.0 (x86_64-pc-linux-gnu) libcurl/7.26.0 OpenSSL/1.0.1t zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3 Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp Features: Debug GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

gsouf commented 5 years ago

Your version of curl is very old. Try to upgrade to version 7.61 and see if it works.

Additionally curl <7.48 has issue with cookies, preventing SERPS to work correctly with cookies.

migliori commented 5 years ago

I'm in touch with my server provider & let you know if it's ok or not as soon as the upgrade is done - may take 1 or 2 days.

Thanks so much for your reactivity & help