sunra / php-simple-html-dom-parser

PHP Simple HTML DOM Parser adaptation for Composer and PSR-0
1.3k stars 352 forks source link

file_get_contents(): stream does not support seeking #48

Open litofunes opened 7 years ago

litofunes commented 7 years ago

(1/1) ErrorException file_get_contents(): stream does not support seeking

$html = HtmlDomParser::file_get_html('http://www.google.com/');

foreach($html->find('a') as $element) echo $element->href . '
';

markebjones commented 7 years ago

From: http://php.net/manual/en/function.file-get-contents.php

"The offset where the reading starts on the original stream. Negative offsets count from the end of the stream.

Seeking (offset) is not supported with remote files. Attempting to seek on non-local files may work with small offsets, but this is unpredictable because it works on the buffered stream."

HtmlDomParser::file_get_html uses a default offset of -1, passing in 0 should fix your problem.

dseegers commented 5 years ago

But let say I want to grab a page using this:

` $file_name = file_get_contents("https://google.com");

$dom = HtmlDomParser::file_get_html($file_name);

` I will get this, not really html. how can I fetch a page as html, how can I fix this :)

file_get_contents(<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="nl"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta

XTard commented 5 years ago

Hello there! May I suggest you a better way to achieve that - using cURL. I've had compatibility issues and all too, but most it got solved when I referred to cURL for pulling pages... I will suggest you some code and sources:

include('./libs/php/simple_html_dom.php'); // To use str_get_html

function request ($url) {
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    // You can add some other options too (e.g. timeout, method, etc)
    $str = curl_exec($curl); // Retrieving the page as a string
    $html = str_get_html($str); // Translating the string to an object
    curl_close($curl); // Make sure to end your session
    return $html;
}
// Save the result HTML to a variable we will later use
$dom = request("https://www.google.com");

foreach ($dom -> find("a") as $element)
    echo $element->href;

PHP cURL

dseegers commented 5 years ago

hi @XTard,

First of all thanks for the reply 👍 :). I am using the HTML dom parser (https://simplehtmldom.sourceforge.io) in the first place. But I was wondering why the Laravel wrapper doesn't work as expected. I already have a local script using the simple-dom-parser, but it would be fun if it worked in Laravel

XTard commented 5 years ago

@dseegers I'm still not sure that I'm getting it right, but let me try one more time. You are using this piece of code, right?

$file_name = file_get_contents("https://google.com");

$dom = HtmlDomParser::file_get_html($file_name);

(1)The problem here is that file_get_contents returns the file in a string. And you need to convert the string into an HTML object with str_get_html (like cURL does in my example), but what you are doing is that you are calling file_get_html to deal with the string. (2)Either pull the page with $dom = HtmlDomParser::file_get_html("https://www.google.com/"); or use str_get_html instead.

(1)php.net:

This function is similar to file(), except that file_get_contents() returns the file in a string, starting at the specified offset up to maxlen bytes. On failure, file_get_contents() will return FALSE.

file_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by your OS to enhance performance.

(2)PHP Simple HTML DOM Parser Manual:

// Create a DOM object from a string
$html = str_get_html('<html><body>Hello!</body></html>');

// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');

// Create a DOM object from a HTML file
$html = file_get_html('test.htm');