pc-magas / phpquery

Automatically exported from code.google.com/p/phpquery
0 stars 0 forks source link

phpQuery::newDocument cannot load multibyte HTML sometime #70

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
-- What steps will reproduce the problem?

Try to parse HTML page 'http://product.dangdang.com/product.aspx?
product_id=175765'

-- What is the expected output? What do you see instead?

I tried to query our product pages. The phpQuery works for me most of the 
time. But for some pages, it just cannot load the page.

For example:'http://product.dangdang.com/product.aspx?product_id=175765'
Here is some part of my code:

  $url= "http://product.dangdang.com/product.aspx?product_id=$id";
  $content='';
  $fp = fopen($url, "r");
  while($fc = fread($fp, 8192)){ 
      $content .= $fc; 
  } 
  fclose($fp);
  phpQuery::$debug = true;
  $doc = phpQuery::newDocument($content);
  echo $doc;//debug

I found the document is not loaded completely, most than half of the 
document content is lost. I tried the setlocale(LC_ALL,'zh_CN'), but 
didn't help a bit.

-- What version of the product are you using? On what operating system?
phpQuery-0.9.4-rc1 on Windows XP Professional 2002

-- Please provide any additional information below.

Original issue reported on code.google.com by huangka...@gmail.com on 24 Oct 2008 at 3:32

GoogleCodeExporter commented 9 years ago
PS:I've verified the $content variable, it's correct, but the $doc content is 
fragmentary (seems the loading process is stopped at certain point)

Original comment by huangka...@gmail.com on 24 Oct 2008 at 3:36

GoogleCodeExporter commented 9 years ago
Hi,

Should be an encoding error.
The page doesn't validate :
http://validator.w3.org/check?uri=http%3A%2F%2Fproduct.dangdang.com%2Fproduct.as
px%3Fproduct_id%3D175765&charset=(detect+automatically)&doctype=Inline&group=0

Original comment by nicolas....@gmail.com on 24 Oct 2008 at 7:32

GoogleCodeExporter commented 9 years ago
To see DOMDocument errors related to loaded document use:
phpQuery::$debug = 2;
phpQuery::newDocumentFile('http://product.dangdang.com/product.aspx?product_id=1
75765');

Found similar problem on bug.php.net (with solution at the end).
http://bugs.php.net/bug.php?id=36843

Original comment by tobiasz....@gmail.com on 24 Oct 2008 at 7:48

GoogleCodeExporter commented 9 years ago
Thanks, tobiasz and nicolas. You're very helpful~

Original comment by huangka...@gmail.com on 28 Oct 2008 at 3:35

GoogleCodeExporter commented 9 years ago

Original comment by tobiasz....@gmail.com on 28 Oct 2008 at 8:55