spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
https://freek.dev/308-building-a-crawler-in-php
MIT License
2.51k stars 358 forks source link

External links to LinkedIn (company pages) returns HTTP status 999 #231

Closed baristacoder closed 5 years ago

baristacoder commented 5 years ago

Hi,

When a page finds a link to a LinkedIn page it gives errors like: Unsuccessful request: GET https://nl.linkedin.com/company/XXX resulted in a 999 Request denied response:

Does anyone know a way to give a status 200 for those kind of pages? Not trying to build a LinkedIn scraper but more to see if an outgoing link is successful.

DuskBrowserTest seems to work on a blank Laravel project, but that will literally open up a chrome instance to go to an url.

freekmurze commented 5 years ago

They just deny access for crawlers. As I don't want this package to pretend it's not a crawler, I'll close this for now.

michaelaguiar commented 4 years ago

How do we get around the guzzlehttp error Status code must be an integer value between 1xx and 5xx. if it returns 999?

michaelaguiar commented 4 years ago

Never mind, setCrawlProfile() does the trick!