Closed GoogleCodeExporter closed 8 years ago
Ok... I must have used a different htmlentities function on the front-end.
I'll look into this as soon as I'm back from vacation.
Original comment by fireproofsocks
on 13 May 2011 at 8:26
Holy smokes, this is way more complicated than I thought. I can't seem to find
where this is getting hijacked... WP might be running the KSES filter on this
thing: the variables coming through the $_POST array are NOT the variables I'm
writing to the form fields. WTF?
Original comment by fireproofsocks
on 15 May 2011 at 4:21
Yes, this is descending into encoding hell:
http://pa2.php.net/manual/en/function.utf8-decode.php
If the mb_string library isn't installed on a server, there's going to be no
reliable way to determine encoding used.
$x = 'ę';
htmlspecialchars($x); // u0119
html_entity_decode($x); // u00c4ufffd
Original comment by fireproofsocks
on 15 May 2011 at 4:48
This seems to hold some promise:
$z = htmlspecialchars(utf8_encode($y));
print utf8_decode(htmlspecialchars_decode($z));
But I'll have to use mb_detect_encoding() to check whether or not that should
kick in or not.
Original comment by fireproofsocks
on 15 May 2011 at 1:29
Issue 99 has been merged into this issue.
Original comment by fireproofsocks
on 9 Jun 2011 at 3:51
This is truly maddening... consider the following PHP snippets:
<?php
print 'ę'; // Run via command line, this works... via Apache, it prints Ä™
print htmlspecialchars('ę'); // does nothing... returns 'ę'
print utf8_encode('ę'); // returns Ä<99> WTF?
print htmlentities('ę'); // returns Ä# WTF?
?>
So this has something to do with the php.ini options (my system has a different
php.ini for command line and for apache).
Original comment by fireproofsocks
on 10 Jul 2011 at 11:08
So the page headers affect the encoding... but we can't change the page
headers. I've been trying some of the functions outlined on
http://www.php.net/manual/en/function.utf8-encode.php#93162, but so far,
nothing works.
Original comment by fireproofsocks
on 11 Jul 2011 at 5:51
Aha... something here worked:
function UTF8ToEntities ($string) {
/* note: apply htmlspecialchars if desired /before/ applying this function
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
return $string;
// reject too-short sequences
$string = preg_replace("/[\302-\375]([\001-\177])/", "�\\1", $string);
$string = preg_replace("/[\340-\375].([\001-\177])/", "�\\1", $string);
$string = preg_replace("/[\360-\375]..([\001-\177])/", "�\\1", $string);
$string = preg_replace("/[\370-\375]...([\001-\177])/", "�\\1", $string);
$string = preg_replace("/[\374-\375]....([\001-\177])/", "�\\1", $string);
// reject illegal bytes & sequences
// 2-byte characters in ASCII range
$string = preg_replace("/[\300-\301]./", "�", $string);
// 4-byte illegal codepoints (RFC 3629)
$string = preg_replace("/\364[\220-\277]../", "�", $string);
// 4-byte illegal codepoints (RFC 3629)
$string = preg_replace("/[\365-\367].../", "�", $string);
// 5-byte illegal codepoints (RFC 3629)
$string = preg_replace("/[\370-\373]..../", "�", $string);
// 6-byte illegal codepoints (RFC 3629)
$string = preg_replace("/[\374-\375]...../", "�", $string);
// undefined bytes
$string = preg_replace("/[\376-\377]/", "�", $string);
// reject consecutive start-bytes
$string = preg_replace("/[\302-\364]{2,}/", "�", $string);
// decode four byte unicode characters
$string = preg_replace(
"/([\360-\364])([\200-\277])([\200-\277])([\200-\277])/e",
"'&#'.((ord('\\1')&7)<<18 | (ord('\\2')&63)<<12 |" .
" (ord('\\3')&63)<<6 | (ord('\\4')&63)).';'",
$string);
// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",
"'&#'.((ord('\\1')&15)<<12 | (ord('\\2')&63)<<6 | (ord('\\3')&63)).';'",
$string);
// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')&31)<<6 | (ord('\\2')&63)).';'",
$string);
// reject leftover continuation bytes
$string = preg_replace("/[\200-\277]/", "�", $string);
return $string;
}
//------------------------------------------------------------------------------
$opt = 'ę';
$utf = UTF8ToEntities( htmlspecialchars($opt) );
print $utf; // prints the correct output (converted, it is ę )
Original comment by fireproofsocks
on 11 Jul 2011 at 5:57
Or more simply:
<?php
function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
return $string;
// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",
$string);
// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);
return $string;
}
?>
Original comment by fireproofsocks
on 11 Jul 2011 at 7:03
Hi !
First excuse my kind weird english i'm a french guy.
I think I figure out to simply fix this problem I had to.
First, you just have to save all php files of the plugin with UTF-8 encoding
(not ISO Latin1)
In fact, I just test this with few files (like "includes/CCTM.php" and
"includes/pages/post_type.php")
and it worked fine for me as far as I can see.
Second, you must get rid of htmlentities in those "includes/pages/*.php".
For example, I change line 143 in post_type.php
<textarea name="description" class="cctm_textarea" id="description" rows="4" cols="60"><?php print htmlentities($def['description']); ?></textarea>
to
<textarea name="description" class="cctm_textarea" id="description" rows="4" cols="60"><?php print ($def['description']); ?></textarea>
And, voilà ! On the main page "custom content types" and edit page "edit
content type" it did the trick.
I don't know if all this is really clear but I hope this helps and to see this
fix in the next update ! ;)
Have fun !
Original comment by whadaff@gmail.com
on 13 Jul 2011 at 2:22
Thanks -- that's part of the problem, but It's a bit more complicated than that
-- it also depends on the settings on your server, so it requires a few other
changes as well so it can work the same way on multiple servers. I think I
have a solution figured out -- I'll post it shortly.
Original comment by fireproofsocks
on 13 Jul 2011 at 5:10
Ugh... still no luck. I'm able to print the correct html entities into the
form values, but when it comes through the post array, it still gets converted,
e.g. "u0119" and "u0142"... so I think I need to write the converse of the
charset_decode_utf_8() function... one that takes u0119 and outputs the wily
foreign character...
Original comment by fireproofsocks
on 16 Jul 2011 at 8:32
"I manually encoded the characters to Special HTML Characters, and created a
field containing those characters. Next I headed to
/includes/elements/multiselect.php and on line 145 and changed
htmlspecialchars($opt) to htmlspecialchars_decode($opt). On my template file I
uncluded the field with the following code: $g = get_custom_field('genre', ',
'); echo htmlspecialchars_decode($g); and it worked.
I know it's not a very good idea to encode to HTML Characters, but it's the
only way to avoid problems with UTF-8 encoding. I believe it's possible to
encode the field characters to HTML Characters before they go into database
(though I couldn't find where it's done). The problem lies in including the
field on the template file, because get/print_custom_field is not a part of
the plugin, but I'm sure it's possible to work this around."
Original comment by fireproofsocks
on 3 Sep 2011 at 5:25
I finally found something that worked, at least in an independent test. Check
this out:
<?php
function charset_decode_utf_8($string) {
$string = htmlspecialchars($string);
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! preg_match("/[\200-\237]/", $string) and ! preg_match("/[\241-\377]/", $string)) {
return $string;
}
// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string);
// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e", "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", $string);
return $string;
}
?>
<html>
<head><title>Test form</title></head>
<body>
<?php
if ( !empty($_POST) ){
print_r($_POST);
}
?>
<form method="post">
<div class="cctm_element_wrapper" id="custom_field_mymulti">
<label for="cctm_mymulti" class="cctm_label cctm_multiselect cctm_multiselect_checkbox" id="cctm_label_mymulti">
MyMulti
</label>
<br/><div class="cctm_muticheckbox_wrapper">
<input type="checkbox" name="cctm_mymulti[]" class="cctm_mymulti cctm_muticheckbox" id="cctm_mymulti0" value="<?php print charset_decode_utf_8('xyzęegg'); ?>" > <label class="cctm_muticheckbox" for="cctm_mymulti0"><?php print charset_decode_utf_8('xyzęegg'); ?></label></div><br/><div class="cctm_muticheckbox_wrapper">
<input type="checkbox" name="cctm_mymulti[]" class="cctm_mymulti cctm_muticheckbox" id="cctm_mymulti1" value="<?php print charset_decode_utf_8('xyzüuugh'); ?>" > <label class="cctm_muticheckbox" for="cctm_mymulti1"><?php print charset_decode_utf_8('xyzęegg'); ?></label></div><br/><div class="cctm_muticheckbox_wrapper">
<input type="checkbox" name="cctm_mymulti[]" class="cctm_mymulti cctm_muticheckbox" id="cctm_mymulti2" value="<?php print charset_decode_utf_8('normal">'); ?>" > <label class="cctm_muticheckbox" for="cctm_mymulti2"><?php print charset_decode_utf_8('normal">'); ?></label></div><br/><span class="cctm_description">Testing</span>
</div>
<input type="submit" value="Submit" />
</form>
</body>
</html>
That WORKS. The foreign characters are properly converted to their HTML-entity
equivalents. But the multi-select doesn't want to get that field out of the
json array correctly...
Original comment by fireproofsocks
on 29 Sep 2011 at 2:30
AHA. It's WP's get_post_meta() and update_post_meta() that is causing this to
fail. Look:
print $value; // ["xyz\u0119egg","xyz\u00fcuugh","normal"] <--- being sent to
the database
print "<hr/>";
update_post_meta( $post_id, $field_name, $value );
$x = get_post_meta($post_id, $field_name, true);
print_r($x); exit; ["xyzu0119egg","xyzu00fcuugh","normal"] <--- coming back
from the database
Original comment by fireproofsocks
on 29 Sep 2011 at 2:54
Original comment by fireproofsocks
on 29 Sep 2011 at 3:07
So it appears the solution is to addslashes() to the value before it goes into
the database. So I've updated the multiselect.php class and modified its
save_post_filter() function:
return addslashes(json_encode($posted_data[ CCTMFormElement::post_name_prefix .
$field_name ]));
That finally works, and will available in 0.9.4. Now to work out the other
fields that are getting weird quotes now.
Original comment by fireproofsocks
on 29 Sep 2011 at 3:41
Way to go !!! It wasn't that obvious after all, Thanks again for your
dedication and making your plugin better.
Original comment by Mt.Zieli...@gmail.com
on 30 Sep 2011 at 3:41
Found one more glitch with this that has to do with the differences between how
WP handles updating meta data and creating it. I had to double up on the
addslashes when the post is being created. Craziness. But it's in 0.9.4.
Original comment by fireproofsocks
on 30 Sep 2011 at 4:05
Original issue reported on code.google.com by
Mt.Zieli...@gmail.com
on 13 May 2011 at 4:06Attachments: