wuchuanbin / phpquery

Automatically exported from code.google.com/p/phpquery
0 stars 0 forks source link

DOMDocumentWrapper #135

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I'm japanese.
I do not understand an English grammar. 

Therefore,Only the code is written. 

---contentTypeFromHTML-----
> @<meta[^>]+http-equiv\\s*=\\s*(["|\'])Content-Type\\1([^>]+?)>@i

This code can not match
<META content="text/html;charset=Shift_JIS" http-equiv=Content-Type>

change
@<meta[^>]+http-equiv\\s*=\\s*(["|\']*)Content-Type\\1([^>]*?)>@i

and

> @content\\s*=\\s*(["|\'])(.+?)\\1@

change
@content\\s*=\\s*(["|\'])(.+?)\\1@i

---charsetFixHTML-----

> @\s*<meta[^>]+http-equiv\\s*=\\s*(["|\'])Content-Type\\1([^>]+?)>@i

change
@\s*<meta[^>]+http-equiv\\s*=\\s*(["|\']*)Content-Type\\1([^>]*?)>@i

and

> $headStart = stripos($markup, '<head>');
> $markup = substr($markup, 0, $headStart+6).$metaContentType
> .substr($markup, $headStart+6);

This code can not match
<head profile="http://example.com"> etc...

change
preg_match('@<head[^>]*>@i', $markup, $matches, PREG_OFFSET_CAPTURE);
if(!isset($matches[0])) return $markup;
$headStart = $matches[0][1];
$headEnd   = strlen($matches[0][0]);
$markup = substr($markup, 0, $headStart+$headEnd).$metaContentType
.substr($markup, $headStart+$headEnd);

Original issue reported on code.google.com by msd.s...@gmail.com on 6 Nov 2009 at 3:54

GoogleCodeExporter commented 8 years ago
Thank you for the the report and the solution.

Original comment by tobiasz....@gmail.com on 7 Nov 2009 at 11:54