thagenbeek / phpquery

Automatically exported from code.google.com/p/phpquery
0 stars 0 forks source link

Multibyte xml documents #76

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Forward from phpQuery Google Group:
http://groups.google.com/group/phpquery/browse_thread/thread/8da2ebfac380133a

Quoting Evgeny Burzak.

> Hello, 
> it seems that phpQuery do not work properly with xml documents in
> multibyte codepage like utf-8. I place here patch:
> http://groups.google.com/group/phpquery/web/phpQueryObject.php.patch.gz
> It replaces standard  string functions with multibyte ones.
> 
> Sample:
> // it works with mb_* patch
> $xml = phpQuery::newDocumentXML('<документа/>');
> 
> $xml['документа']->append('<список></список>');
> $xml['документа список'] = 
'<эл>1</эл><эл>2</эл><эл>3</эл>';
> print "<xmp>$xml</xmp>"; 

Original issue reported on code.google.com by tobiasz....@gmail.com on 6 Nov 2008 at 9:44

Attachments:

GoogleCodeExporter commented 9 years ago
First of all, thanks for the patch.

Ive applied most of it, but there are issues with mb_ereg* functions:
 * mb_ereg and ereg does not support lazy quantifiers (eg /a+?/)
 * mb_ereg does support special character groups (eg \w \d) but normal ereg doesn't

I've added compatibility layer for mb_* functions but couldn't use ereg 
equivalent
since second issue mentioned above. Thats why every regex matching must be 
written
separately for mbstring and non-mbstring.

There are still some non-mbstring-aware regex matching in the code, but they 
don't
affect standard use. Your example works, but to get it working, proper encoding 
for
mbstring must be set, like so:
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

I've commited fixes in r289. You have to checkout branches/dev to get it.

Can you confirm that this issue is fixed ?

Original comment by tobiasz....@gmail.com on 6 Nov 2008 at 12:01

GoogleCodeExporter commented 9 years ago
No, it did`t.
I`ve found comments in phpQueryObject.php with link to this page, uncommented 
it and
nothing happen.

Use command:
svn checkout http://phpquery.googlecode.com/svn/trunk/ branches/dev
May be it wrong?

Original comment by buzz...@gmail.com on 12 Nov 2008 at 7:05

GoogleCodeExporter commented 9 years ago
Proper command for checking out dev branch is:
svn checkout http://phpquery.googlecode.com/svn/branches/dev/ phpQuery-dev

More about SVN in the wiki:
http://code.google.com/p/phpquery/wiki/SVNCheckout

Original comment by tobiasz....@gmail.com on 12 Nov 2008 at 9:57

GoogleCodeExporter commented 9 years ago
r301 works fine.

Original comment by buzz...@gmail.com on 13 Nov 2008 at 5:30