matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.74k stars 2.63k forks source link

The string to escape is not a valid UTF-8 string in "@CoreHome/getDefaultIndexView.twig". #4410

Closed mattab closed 8 years ago

mattab commented 10 years ago

Reported in the forum: http://forum.piwik.org/read.php?2,108645

There was proposed solution: http://forum.piwik.org/read.php?2,108645,page=1#msg-108949


by setting:

[database]
charset = utf8
mattab commented 10 years ago

Since there is a work-around available, I decrease priority of this ticket. If you experience this bug, please comment here! we would like to hear from more users having the issue.

mattab commented 10 years ago

Increasing priority as user sent us the database by email, so let's try to replicate!

Please report in this issue if you experience this bug, we need more report to undertand it.

ghost commented 9 years ago

I had the same problem after our hoster Mittwald provided an update from Piwik 2.10 to 2.13.1. Adding this charset seemed to fixed the problem, but umlaute from old entries look broken now, so I reverted. It's php5.6.5 and MySQL 5.5. The database and tables are utf8_general_ci. php.ini has default_charset = "UTF-8" We track several websites, but only with one I had this problem. In the other pages that still work there are also umlaute, but there was no problem without the setting.

I think I could track it down. The problem are umlaute in goal name. I renamed the goal with an umlaut in the DB and afterwards Piwik frontend was working again for this website, too.

anEffingChamp commented 8 years ago

I hit this problem on a fresh installation. Piwik reports:

The string to escape is not a valid UTF-8 string in "@Installation/welcome.twig" at line 1.

I checked the template files for the plugin, and every thing looks fine. The error reported suggests that the problem may exist in the layout.twig, but that seems fine too. However when I remove the offending line Piwik progresses to the first installation page, but missing a lot of HTML.

garvinhicking commented 8 years ago

Hi!

After upgrading from 2.15.0 to 2.16.0 viewing a segment report comes up with:

[code] The string to escape is not a valid UTF-8 string in "@CoreHome/getDefaultIndexView.twig" at line 7. [/code]

There are several segment reports that use umlauts. They worked just the same in 2.15.0 (!!!), and in the normal backend they show up without a problem in dropdowns.

What's happening? Googled a bit, found to insert 'charset = "utf8"' to config.ini.php but that doesn't change anything.

I don't know Twig, how can I debug exactly what UTF-8 string is making trouble here?!

VorobeY1326 commented 8 years ago

Absolutely same issue after updating to 2.16.0. Trying to segment by userId -> fail in file getDefaultIndexView at line 7. Not sure about using some special utf-8 symbols, userId is just md5 string.

Edit: every segmented report fails, not only by userId.

garvinhicking commented 8 years ago

To the devs: I'd love to debug, can you give a pointer where in PHP-scope this line 7 gets filled, or to see the actually "bad" content? I don't know where to start looking.

VorobeY1326 commented 8 years ago

+1, ready to debug this issue, just give some hint

schwindelbub commented 8 years ago

I had the same issu after migrating an existing piwik instance to a another server. We found out that two things resolved this issu:

  1. Switch to user language "English" instead of "Deutsch" in the user profile. Clear the browsercache. But in this case every user has to use the "Deutsch" settings. Its a bad workaround i think.
  2. Setting the default_charset in the php.ini. Maybe piwik does not handle it correct. Is this value in the php.ini is empty, the default fallback "UTF-8" should take place. But if we set it explicit everything works fine. http://php.net/manual/de/ini.core.php#ini.default-charset Ok, it should not be empty, but it can.

Sorry, but unfortunately the second option doesn't work for me. System: PHP 5.6.19, 5.5.44-MariaDB

mattab commented 8 years ago

@schwindelbub @VorobeY1326 @garvinhicking @anEffingChamp @hdi-kw Please test this patch: https://github.com/piwik/piwik/pull/9926/files

Does it fix the issue for you?

@schwindelbub thank you for the tip re: default_charset - I never came across this setting before and this may actually be the solution :-)

garvinhicking commented 8 years ago

Sadly no change for me, I also forced default_charset in the php.ini.

I'd really love to understand which string exactly twig is trying to escape in getDefaultIndexView.twig at line 7. There must be some way to intercept the actual string so that I can understand where it comes from, and in which charset it is?!

schwindelbub commented 8 years ago

For me there is also no change with this patch.

schwindelbub commented 8 years ago

I find another problem: If a visitor comes from a google search with a "ß" in the search term, this term is broken in the report.

For example: Searchterm: maße In the report: maã_e

Every other Umlaut works fine.

schwindelbub commented 8 years ago

As an hint: I migrated from Debian with PHP 5.4.45 and mysql 5.5.47 to CentOS with PHP 5.6.19 and 5.5.44-MariaDB

schwindelbub commented 8 years ago

I found a workaround that fixes the crash: Just set a comment around the Template.nextToCalendar in the twig template getDefaultIndexView.twig

{# {{ postEvent("Template.nextToCalendar") }} #}

I have no idea what this line does :-/ But it works without :-)

garvinhicking commented 8 years ago

@schwindelhub

cool, thanks for your effort, if the devs don't care :) I'll try it out, probably some event hooks for plugins there are executed... Maybe the problem is the system locale which uses ISO instead of UTF-8 for date outputs. Maybe setlocale() instead of default_charset can help here. My install is running on Windows, probably there's no UTF-8 locale abailable there...

On 23.03.2016, at 08:57, Schwindelhub notifications@github.com wrote:

I found a workaround that fixes the crash: Just set a comment around the Template.nextToCalendar in the twig template getDefaultIndexView.twig

{# {{ postEvent("Template.nextToCalendar") }} #}

I have no idea what this line does :-/ But it works without :-)

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

Cruiser13 commented 8 years ago

Same issue for me, every segmented report fails with error message The string to escape is not a valid UTF-8 string in "@CoreHome/getDefaultIndexView.twig" at line 7 - any support would be welcome.

Viswan-piwki commented 8 years ago

Hi there, please help in fixing this. Charset does not work. Should we install one less version say 2.15 ?

VorobeY1326 commented 8 years ago

@schwindelbub Workaround works cause this line renders element with segmented reports. Works for me too, now I can't choose any segment: no segment no crash :)

VorobeY1326 commented 8 years ago

More interesting scenario: 1) select some segment -> site fails with exception "The string to escape ..." 2) comment line with "nextToCalendar" as @schwindelbub suggested 3) restart site 4) reload failed page -> page opens, segmenting works, but no commented element, so I can't choose some other segment

Seems like this element (where I can choose segment) fails to render if some segment is selected. And in my case name of this segment was "IE 11", nothing very special.

tsteur commented 8 years ago

FYI: For those who want to debug it get's filled here https://github.com/piwik/piwik/blob/2.16.0/plugins/SegmentEditor/SegmentEditor.php#L35-L36 with this template https://github.com/piwik/piwik/blob/2.16.0/plugins/SegmentEditor/templates/_segmentSelector.twig

tsteur commented 8 years ago

To debug maybe remove some parts from the twig file, reload the page, and see if it still occurs. Slowly maybe remove more parts until the error doesn't occur anymore. This way you can maybe find a part of the template where it tries to escape the value.

However, likely it is not a problem with the template itself but the stored value in the database. I would recommend to make sure

If someone can give me access to the server that has such a problem I would be happy to debug and try to find the problem. I would need access to the files, the database and the actual Piwik UI. If someone can provide access please email us at hello (at) piwik.org and afterwards leave a comment here in case it goes into spam folder.

Viswan-piwki commented 8 years ago

Ok, now Cannot connect to the database:

SQLSTATE[42000] [1115] Unknown character set: 'UTF-8'

Viswan-piwki commented 8 years ago

If I replace UTF-8 with utf8, i get this error back. The string to escape is not a valid UTF-8 string in "@CoreUpdater/runUpdaterAndExit_welcome.twig" at line 1. Although I dont see this line in this twig file

garvinhicking commented 8 years ago

@tsteur Thanks, your hat-tip was gold. It is actually not a problem from the database point of view, but seems to stem from the i18n interface itself. See long description below.

I found the issue within SegmentSelectorControl.php when setting $this->segmentDescription. We have segments that have simply "Path contains XXX" definitions. Inside the DB, only the XXX definition is stored, and when it gets put into $this->segmentDescription, a piwik translation mechanism adds the german translation for "Path contains XXX" to it. For german, this is "Seiten-URL enthält XXX".

This "ä" character contains the URL entity %E3%A4, which is actually the valid UTF-8 encoding for "ä". However, at a point I cannot really debug at this point, this gets unproperly handled.

I patched my SegmentSelectorControl to read:

$this->segmentDescription = urlencode($formatter->getHumanReadable(Request::getRawSegmentFromRequest(), $this->idSite))

this will actually put in encoded special characters, but now the Reporting is no longer broken \o/

As for you devs, maybe you can reproduce it by choosing the German interface, so that the special character gets added there. I bet at some point there's some double UTF-8 encoding going on, and with the english language interface this doesn't happen.

I tried to quickly go through the language files, but couldn't find the file where the string for the Interface is actually defined.

Viswan-piwki commented 8 years ago

Hi there, any one please advice on SQLSTATE error please

VorobeY1326 commented 8 years ago

@garvinhicking Same issue for me, urlencoding in this line fixes issue. And no problems in DB as suggested @tsteur, only ascii symbols in segment names.

schwindelbub commented 8 years ago

@garvinhicking Your fix also works for me! Thank you so very much!!

schwindelbub commented 8 years ago

But a last question: Why occurred this error not on the old server? (@_@)

garvinhicking commented 8 years ago

@schwindelbub Good to hear. I believe it could be a change related to one of the german translation files (you and @VorobeY1326 are using German as well, I figure?) Maybe that file is mistakenly no longer UTF-8, or got transferred badly. Let's see what the devs say. If we can see where the actual string gets pulled from (I search for "enth.{1,8}lt" in all files, but did not find this word) we could get more debugging.

@Viswan-piwki Can't help with your issue, I don't think it's related to this. You are having an upgrader problem, maybe open another issue. "UTF-8" is not a valid SQL charset, you should either leave out the charset= option completely, or use "utf8".

VorobeY1326 commented 8 years ago

@garvinhicking I'm using russian translation, so maybe it's problem in several translations.

VorobeY1326 commented 8 years ago

Changing user language to English fixes problem also.

schwindelbub commented 8 years ago

@garvinhicking Yes, i am using the german translation. The error occured only, if the profiles language is german. I think the "ä" is guilty :)

I searched in the docroot for the string in all files. Here are the results

find . -type f -exec grep -qi "enthält" {} \; -print0 | xargs -0 file

New Server ./lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Dashboard/lang/de.json: UTF-8 Unicode text ./plugins/DevicesDetection/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/UserCountry/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Installation/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Referrers/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Actions/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/CoreAdminHome/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/CustomVariables/lang/de.json: HTML document, UTF-8 Unicode text, with very long lines ./plugins/Live/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/SitesManager/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Goals/lang/de.json: UTF-8 Unicode text, with very long lines

Old Server ./lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Installation/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Referrers/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Actions/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/SitesManager/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/CustomVariables/lang/de.json: HTML document, UTF-8 Unicode text, with very long lines ./plugins/DevicesDetection/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Live/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Dashboard/lang/de.json: UTF-8 Unicode text ./plugins/CoreAdminHome/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/Goals/lang/de.json: UTF-8 Unicode text, with very long lines ./plugins/UserCountry/lang/de.json: UTF-8 Unicode text, with very long lines

For me are the files identicaly. Maybe it helps.

tsteur commented 8 years ago

I tried to reproduce this problem for a while but couldn't, even with German language. äöüß etc is displayed correctly.

Changing user language to English fixes problem also.

That's interesting. When switching to German (or Russian) language there are two things different.

Can everyone check whether the German locale is installed on their server? Eg via locale -a on Linux bash

But a last question: Why occurred this error not on the old server?

That would be interesting to know. Can you maybe compare installed locale's? Eg via locale -a on Linux bash . Is the PHP version the same on both systems? Maybe you can execute php -i and compare the installed PHP versions on both systems.

Which PHP version is everyone using?

You can find out by eg using php --version

garvinhicking commented 8 years ago

I'm using a Windows IIS server, how do I figure out locales there? PHP didn't change on the upgrade. I only have limited access to the server so I can' check PHP version right now, but I believe it's 5.3. IIS/PHP versions dodn't change when upgrading Piwik last time where it worked before.

On 01.04.2016, at 00:21, Thomas Steur notifications@github.com wrote:

I tried to reproduce this problem for a while but couldn't, even with German language. äöüß etc is displayed correctly.

Changing user language to English fixes problem also.

That's interesting. When switching to German (or Russian) language there are two things different.

It tries to load a different locale. In my case it tried to load de_DE.UTF-8 (installed was de_DE.utf8 so also tried to reproduce it using that locale) It uses different language files. It probably works with English language as they don't contain the umlaut. Can everyone check whether the German locale is installed on their server? Eg via locale -a on Linux bash

But a last question: Why occurred this error not on the old server?

That would be interesting to know. Can you maybe compare installed locale's? Eg via locale -a on Linux bash . Is the PHP version the same on both systems? Maybe you can execute php -i and compare the installed PHP versions on both systems.

Which PHP version is everyone using?

You can find out by eg using php --version

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

schwindelbub commented 8 years ago

As i told you above: https://github.com/piwik/piwik/issues/4410#issuecomment-199812967 But we tested the problem with the same PHP Version, on CentOS.

And here the locale results

locale -a | grep -i de

New Server de_AT de_AT@euro de_AT.iso88591 de_AT.iso885915@euro de_AT.utf8 de_BE de_BE@euro de_BE.iso88591 de_BE.iso885915@euro de_BE.utf8 de_CH de_CH.iso88591 de_CH.utf8 de_DE de_DE@euro de_DE.iso88591 de_DE.iso885915@euro de_DE.utf8 de_LU de_LU@euro de_LU.iso88591 de_LU.iso885915@euro de_LU.utf8 deutsch fy_DE fy_DE.utf8 gez_ER@abegede gez_ER.utf8@abegede gez_ET@abegede gez_ET.utf8@abegede hsb_DE hsb_DE.iso88592 hsb_DE.utf8 ks_IN@devanagari ks_IN.utf8@devanagari nds_DE nds_DE.utf8 sd_IN@devanagari sd_IN.utf8@devanagari

Old Server: locale -a | grep -i de de_DE.utf8

On the new Server are more locales avaiable.

VorobeY1326 commented 8 years ago

@tsteur I'm using PHP 5.6.8 on Windows server / IIS. Current system locale is russian.

Cruiser13 commented 8 years ago

Having the same issues here with IIS and PHP 5.5.33, german locale

tsteur commented 8 years ago

Maybe a random idea as it was mentioned it works with English language. I kind of want to find out whether it's related to the language file or eg the set locale.

Can someone replace the English language file with the German language file, switch language to English and see if it works?

Something like this

mv lang/en.json  lang/en.json.backup
cp lang/de.json lang/en.json

Then switch to English language and reload. You might also need to clear the cache directory in between:

rm -rf tmp/cache/*

After the test you can restore the correct english file by executing something like

mv lang/en.json.backup  lang/en.json

or

cp lang/en.json.backup  lang/en.json
rm lang/en.json.backup
garvinhicking commented 8 years ago

Nice idea. I'll test, but it'll take some time.

Just to be sure: the central lang/de.json file, not a lang file within a plugin directory, yes?

On 03.04.2016, at 22:57, Thomas Steur notifications@github.com wrote:

Maybe a random idea as it was mentioned it works with English language. I kind of want to find out whether it's related to the language file or eg the set locale.

Can someone replace the English language file with the German language file, switch language to English and see if it works?

Something like this

mv lang/en.json lang/en.json.backup cp lang/de.json lang/en.json Then switch to English language and reload. You might also need to clear the cache directory in between:

rm -rf tmp/cache/* After the test you can restore the correct english file by executing something like

mv lang/en.json.backup lang/en.json or

cp lang/en.json.backup lang/en.json rm lang/en.json.backup — You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

tsteur commented 8 years ago

Yes, the string that contains the previously mentioned enthält is in lang/de.json and not in a plugin directory.

schwindelbub commented 8 years ago

@tsteur As you can see here, the string is not only in the lang/de.json https://github.com/piwik/piwik/issues/4410#issuecomment-204275477

tsteur commented 8 years ago

As you can see here, the string is not only in the lang/de.json

True. I had a look in the code though and for this particular widget it should use the string from that file.

VorobeY1326 commented 8 years ago

Installed fresh 2.16.0 piwik to my local computer (Windows 7, IIS) and the issue easily reproduced. Just logged several actions, added one segment named "IE 11", clicked "save and use it" and voila — crash. Language of piwik is russian.

tsteur commented 8 years ago

Can you give https://github.com/piwik/piwik/issues/4410#issuecomment-205053449 a try or give us access to a server where we can debug the problem maybe? If so please let us know via email: hello at piwik.org.

garvinhicking commented 8 years ago

@tsteur Sorry it took some time. I was able to test now. Copying the english language file over the german one fixes the problem as well.

So I guess that the high-byte NON-ASCII char there in "enthält" must cause some sort of trouble in the PHP/IIS environment. Strangely, many other occurences of the german language file cause no problem at all, so I guess this ->getHumanReadable() method maybe double-encodes the UTF-8 input at some point?

tsteur commented 8 years ago

Thanks for that @garvinhicking

In https://github.com/piwik/piwik/blob/2.16.1/plugins/SegmentEditor/templates/_segmentSelector.twig#L2 can you try to replace {{ 'SegmentEditor_CurrentlySelectedSegment'|translate(segmentDescription)|e('html_attr') }} with {{ 'SegmentEditor_CurrentlySelectedSegment'|translate(segmentDescription|raw)|e('html_attr') }}. Basically it adds the |raw before the escaping.

schwindelbub commented 8 years ago

@tsteur This patch has no effect on my PIWIK (now 2.16.1). I still get the error, if the profile language ist "Deutsch".

And sorry, but i can't give you access to my server.

garvinhicking commented 8 years ago

@tsteur Sadly, same for me. Doesn't fix the issue, same error.

Cruiser13 commented 8 years ago

+1 with a non-fixed piwik including that patch.