stumbled on this one trying to scrape a government site, of all things. This
bad page
mysteriously does not have an open <html> tag (but has a </html> at the end..!)
- not sure if it
should be an issue or not, but since this worked fine in the July 2008 version
(.9.1) I figured I
would submit report.
What steps will reproduce the problem?
1. create document with <head> and <body> blocks but do NOT wrap in <html> block
2. try to query inside body block or just print pq();
What is the expected output?
all content, including any blocks after the <head> block. in version .9.1,
this worked.
What do you see instead?
nothing - parser does not find or recognize <body> or any other block after the
<head>
test code:
$doc = '<head><title>SomeTitle</title>
</head>
<body bgcolor="#ffffff" text="#000000" topmargin="1" leftmargin="0">blah
</body>';
$pq = phpQuery::newDocument($doc);
echo $pq;
of course, I can fix this in the PHP for this version, by prepending '<html>'
before parsing into
phpQuery, but before it worked, and seems like it still should.
Original issue reported on code.google.com by joey...@gmail.com on 7 Jan 2009 at 6:10
Original issue reported on code.google.com by
joey...@gmail.com
on 7 Jan 2009 at 6:10