scottmac / opengraph

Helper class for accessing the OpenGraph Protocol
463 stars 151 forks source link

Why not use curl instead ? #4

Open ccazette opened 13 years ago

ccazette commented 13 years ago

Simple change... A lot more compatible, and far less compatibility issues than file_get_contents (allow_url_fopen must be on, other issues I couldn't even sort out on fetching remote content via file_get_contents() on my production server)...

Just a suggestion, as I made the change on my own and things work fine for me now.

/* * Fetches a URI and parses it for Open Graph data, returns * false on error. * @param $URI URI to page to parse for Open Graph data * @return OpenGraph */ static public function fetch($URI) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $URI); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $contents = curl_exec($ch); curl_close($ch); return self::_parse($contents); }

tmaiaroto commented 12 years ago

I've used cURL in my fork if you like.

Argonalyst commented 12 years ago

Well, actually, I don't know why, but a lot of websites don't let you parse the contents using cURL... I tried to use this cURL funtion to get the contents from http://nytimes.com, as an example, and I just can't... nonetheless with the regular file_get_contents I was succesful retrieving the open graph... cURL could be faster, but the file_get_contents by now is handling a greater range of websites in my point of view..

tmaiaroto commented 12 years ago

That's interesting. I wonder if there's any options that could be passed with cURL to change that. Maybe they try to prevent scraping so if a user agent or something was passed with the request maybe for example.

MitchellMcKenna commented 11 years ago

in pull request #8, with the cURL options I have set, I have found I actually am getting more results back with cURL than file_get_contents(). nytimes.com worked fine. I did notice some websites required a user-agent to be set, they didn't seem to care what it's set to, just as long as it was set, so I set it to $_SERVER['HTTP_USER_AGENT'].

feelsickened commented 9 years ago

Hi Guys, I'm not the most advanced user, but this opengraph script works a treat - for all the sites I'm working with except nytimes.com. I've migrated from the version that used file_get_contents over to cURL - and have even attempted manipulating my own HTTP_USER_AGENT. for example:

$user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36'; curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);

Result is always: array(1) { [0]=> string(5) "title" } NULL title => Log In - The New York Times

What workarounds/tricks have resolved this for you?