ressio / pharse

Fastest PHP HTML Parser
83 stars 15 forks source link

file_get_dom, runs forever #32

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What will reproduce the problem?
$html = file_get_dom('http://www.nhl.com/ice/schedulebyseason.htm');

What is the expected output? What do you see instead?
After taking more than 30 seconds and triggering a fatal error many times, I 
set `set_time_limit(0);`. It has been ongoing since for about 15 minutes.
"Fatal error: Maximum execution time of 30 seconds exceeded in 
C:\xampp\htdocs\hockey\ganon.php on line 238"

Which version are you using?
Ganon single file PHP5 (rev. #78)
PHP 5.4.7

Please provide any additional information below.
It worked with the examples provided, 'code.google.com'

Original issue reported on code.google.com by jonathon...@gmail.com on 8 Mar 2013 at 1:24

GoogleCodeExporter commented 9 years ago
This library is implemented using only PHP. Performance for big pages can 
definitely be improved. Feel free to contribute patches!

Thanks for you report, sorry for the late response.

Original comment by niels....@gmail.com on 7 Apr 2013 at 3:15

GoogleCodeExporter commented 9 years ago
Well it doesn't really matter to me, I was not and am not here to complain. I 
have moved on to a web scraper that works for me.

Other web scrapers, built only in php, scrape the same page in under a second 
so I do not think it is performance, per se, that is the problem (and it is not 
that big). I believe it is far more likely that it is in an infinite loop.
Note: PHP Simple HTML DOM Parser has a problem with the page as well, but 
instead of running indefinitely seems to return null.

Original comment by jonathon...@gmail.com on 7 Apr 2013 at 3:28