vovanmozg / phpquery

Automatically exported from code.google.com/p/phpquery
0 stars 0 forks source link

Honor document's or default charset in inserts #17

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Example code causing the problem:
pq('body')->append('żźć');

Original issue reported on code.google.com by tobiasz....@gmail.com on 27 Jul 2008 at 8:02

GoogleCodeExporter commented 8 years ago

Original comment by tobiasz....@gmail.com on 29 Jul 2008 at 11:29

GoogleCodeExporter commented 8 years ago
Hi,

I look on this issue before add the #24.
That's exactly the opposite which happens.
Using "pq('body')->append('żźć');" I haven't any problem.
But using "pq('body')->append('<span>żźć</span>');" I've one!

Original comment by nicolas....@gmail.com on 15 Aug 2008 at 7:04

GoogleCodeExporter commented 8 years ago
There is a mistake in report content. Inserting text nodes doesn't break 
encoding. I
was thinking about inserting new DOMs.
Thx for catching it.

Original comment by tobiasz....@gmail.com on 15 Aug 2008 at 8:59

GoogleCodeExporter commented 8 years ago
Hi,

Perhaps a first response element : replacing the line 1765
"@$DOM->loadHtml($target);" by "@$DOM->loadHtml(utf8_decode($target));" it work
perfectly.

Original comment by nicolas....@gmail.com on 18 Aug 2008 at 12:19

GoogleCodeExporter commented 8 years ago
Ive applied you idea as tempolary solution, but it doesnt fully solve the 
problem
(not all docs are utf8).

Its avaible in /branches/dev, revision 47:
http://code.google.com/p/phpquery/source/detail?r=47

Original comment by tobiasz....@gmail.com on 27 Aug 2008 at 9:49

GoogleCodeExporter commented 8 years ago
Yea I just saw it. I'll plan on it in few week I think.

Original comment by nicolas....@gmail.com on 28 Aug 2008 at 8:12

GoogleCodeExporter commented 8 years ago
Another problem appears.

Always using append when the target element contains no HTML but encoded 
characters
like "é" for "é", it display it "brut way", like "testé" display
"testé".

I've try some solution like replacing line 1817 of r47 
"$this->DOM->createTextNode(
$target )" by "$this->DOM->createTextNode( $target )" by 
"$this->DOM->createTextNode(
utf8_decode(utf8_encode(html_entity_decode($target))) )". It work for this case 
but
don't work using "append('testé')" directly :/.

I'm searching...

Original comment by nicolas....@gmail.com on 8 Sep 2008 at 9:36

GoogleCodeExporter commented 8 years ago
NB : actually I'm using "append('<span>'.$myText.'</span>')" to do it.

Original comment by nicolas....@gmail.com on 8 Sep 2008 at 9:43

GoogleCodeExporter commented 8 years ago
Try this:
$htmlCode = mb_convert_encoding($htmlCode, 'HTML-ENTITIES', "UTF-8");

Original comment by tobiasz....@gmail.com on 8 Sep 2008 at 10:00

GoogleCodeExporter commented 8 years ago
Thanks for the tip.
To make it work we must reverse the parameters like that :
"mb_convert_encoding($htmlCode, "UTF-8", 'HTML-ENTITIES');".

But I've always the same problem : the solution which work with "testé" don't
work with "testé" and reverse.

We should detect which one is, and then, call the function.
But how to detect it?

Original comment by nicolas....@gmail.com on 9 Sep 2008 at 7:53

GoogleCodeExporter commented 8 years ago
NB : I've had this "mb_internal_encoding('UTF-8');", it's OK but the problem 
stays.
Also, i've tryed lot of solutions, not only "mb_convert_encoding($htmlCode, 
"UTF-8",
'HTML-ENTITIES');", but nothing right for the both case :/.

Original comment by nicolas....@gmail.com on 9 Sep 2008 at 7:55

GoogleCodeExporter commented 8 years ago
Solution find!

I've replaced the else statement at line 1814 in r47 by
"
                                        // insert selected element
                                        } else {
                                                if (mb_detect_encoding($target)=='ASCII')
                                                    $target =
mb_convert_encoding($target,'UTF-8','HTML-ENTITIES');
                                                $insertFrom = array(
                                                        $this->DOM->createTextNode(
$target )
                                                );
                                        }
"

Enjoy ^^! But need to be really FIX like the utf8_decode() adding.

Original comment by nicolas....@gmail.com on 9 Sep 2008 at 8:23

GoogleCodeExporter commented 8 years ago
One solve, one appears :/.

Using "appendTo($phpQueryObject)" all my textnode is truncated after the first
accentuated characters (i.e. : "Article propriété" become "Article propri?").

That's happening appening a list element containing some items :
<code>$myFirstPhpQueryElement->appendTo($mySecondPhpQueryElement);</code>

Plan on it actually...

Original comment by nicolas....@gmail.com on 9 Sep 2008 at 1:14

GoogleCodeExporter commented 8 years ago
I happily cannot reproduce "truncated after the first accentuated characters" ;)

Ive applied all patches mentioned here in r49 and added a test case in
test_encoding.php which tries to reproduce your last report.

Original comment by tobiasz....@gmail.com on 13 Sep 2008 at 7:17

GoogleCodeExporter commented 8 years ago
I have also change the attr() method replacing <code>$node->setAttribute($a,
$value);</code> with <code>
$node->setAttribute($a, mb_convert_encoding($value, 'UTF-8', 
'HTML-ENTITIES'));</code>

<code>mb_convert_encoding($value, 'UTF-8', 'HTML-ENTITIES')</code> seems to be 
very
usefull in this quest :). I think.

Original comment by nicolas....@gmail.com on 17 Sep 2008 at 12:07

GoogleCodeExporter commented 8 years ago
Ive added attr fix in r52 or r53.

This seems to fix all issues for utf8 documents ?

Original comment by tobiasz....@gmail.com on 17 Sep 2008 at 2:58

GoogleCodeExporter commented 8 years ago
Issue 60 should fix this one. Not only for utf-8.
http://code.google.com/p/phpquery/issues/detail?id=60

Please check.

Original comment by tobiasz....@gmail.com on 16 Oct 2008 at 12:16

GoogleCodeExporter commented 8 years ago
Having a charset trouble using r203.
And it was OK before r203.

Doing :
<php>
$miniature_element
        ->find('th.mois_enCours')
            ->text(strtolower(strftime('%B %Y',$ts_actuel)))
;
</php>

"strftime('%B %Y',$ts_actuel)" return "décembre 2008".
Before R203, no problem.
With R203 it don't display the "é" correctly (under ff display a "?" in a 
diamond).

Original comment by nicolas....@gmail.com on 16 Oct 2008 at 1:56

GoogleCodeExporter commented 8 years ago
NB : same effect using "html()" or "append()" method.

Original comment by nicolas....@gmail.com on 16 Oct 2008 at 2:00

GoogleCodeExporter commented 8 years ago
strftime by default returns string in latin1 charset. Read more about it in 
this comment:
http://pl.php.net/manual/en/function.strftime.php#72482

Tested and worksforme:
setlocale(LC_ALL, 'pl_PL.UTF-8');
$string =  strftime('%B %Y', time());
$doc['p:first']->append($string)->dump();

Marking this issue as fixed, as all inserts/appends to an existing document 
honors
target document's charset. No charset conversions are implemented.

Original comment by tobiasz....@gmail.com on 18 Oct 2008 at 8:48