symphonists / search_index

Search Index provides an easy way to implement high performance fulltext searching on your Symphony site
32 stars 21 forks source link

Truncation isn't UTF-8 compatible #28

Closed kirkstrobeck closed 7 years ago

kirkstrobeck commented 12 years ago
<excerpt><p> img_0335_2-4ea5b0c3834e0.jpg 1163 2011-10-24 Schminkanleitung: Step 1: Zeichnen Sie mit dem Kinder-Schminkstift Grün eine ovale Form über das ganze Gesicht. Anschließend mischen Sie aus den Tiegel-Farben Grün, Gelb und Weiß ein blasses gelb-gr�&#8230;</p></excerpt>

Having issues with German characters being broken into non characters,
my guess is a UTF-8 substring error somewhere ..

Taken from ..

Zeichnen Sie mit dem Kinder-Schminkstift Grün eine ovale Form über das ganze Gesicht. Anschließend mischen Sie aus den Tiegel-Farben Grün, Gelb und Weiß ein blasses gelb-grün zusammen und malen damit die Fläche mit dem breiten Pinsel aus.

I checked thru the code and saw your mb_substring fix anf the call to self, so I'm stumped :(

The XSLT error page displays

loadXML(): Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0x26 0x23 0x38 in Entity, line: 760
nitriques commented 7 years ago

@animaux Would this be useful ? We have a method in the General class that handles mb_string when installed.

animaux commented 7 years ago

I haven’t encountered this myself but it would be useful to since it should not only affect german chars but plenty other UTF characters. (I think.)

michael-e commented 7 years ago

I remember that the UTF-8 truncation issue has been fixed in the Symphony core some time ago. (Probably it uses the method from the General class which @nitriques mentioned.)

animaux commented 7 years ago

@michael-e So this is likely fixed then?

michael-e commented 7 years ago

Yes, it is fixed in the core, as far as I know. One might take a look there to see how it has been done. :-)

animaux commented 7 years ago

That would likely be pearls before swine in my case :D.

michael-e commented 7 years ago

So you can only hope for @nitriques getting his hands dirty while you sit back and relax. :-)

animaux commented 7 years ago

Better than the ol’ carpenter fixing the delicate watch clockwork, I guess.

nitriques commented 7 years ago

Great to see this! Closing!