matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.93k stars 2.66k forks source link

The string to escape is not a valid UTF-8 string. #18763

Open peterhashair opened 2 years ago

peterhashair commented 2 years ago

Expected Behavior

Got this error, when accessing this url

/?module=Widgetize&action=iframe&moduleToWidgetize=Referrers&idSite=1&period=year&date=2022-01-20&actionToWidgetize=getKeywords&viewDataTable=table&filter_limit=5&isFooterExpandedInDashboard=1
The string to escape is not a valid UTF-8 string.
in /Users/bagduch/Work/php/matomo/plugins/Live/templates/_actionCommon.twig line 26          

It seems like there is a special URL image

Current Behavior

Possible Solution

Steps to Reproduce (for Bugs)

1. 2.

Context

Your Environment

sgiehl commented 2 years ago

@peterhashair I guess you are using UTF8MB4? could you check if those "broken" characters were stored broken in the database? Did you track them manually somehow or were the generated by the visitor generator plugin?

ZE3kr commented 2 years ago

I got the same error on Matomo (v4.8.0)

The string to escape is not a valid UTF-8 string.
in /var/www/matomo/plugins/Live/templates/_actionCommon.twig line 25     

And I am using UTF8MB4, which I think is the default. I have migrated between two servers a while ago. But I keep the config file the same.

sgiehl commented 2 years ago

@ZE3kr Are you tracking any URLs that contain special characters?

ZE3kr commented 2 years ago

@sgiehl: @ZE3kr Are you tracking any URLs that contain special characters?

Actually I am not sure. The only useful information on the error page is the one I put there. If I enable debug mode can I see whether I am tracking special URLs?

on my website there is no special urls. However it does not mean someone will not include special characters in the query string or to an non-exist page.

Frederic-P commented 1 year ago

I've been facing the same problem for a while and got to track down the offending link in the database: This word as part of a valid search query causes the error 500: χριβιορε . When looking at the string in MYSQL it is valid, this value is stored in the table log_action field name, which has as encoding utf8mb4_0900_ai_ci. When I modified the twigfile and added {{dump(action.url}} that string is shown as χ�_ιβιο�_ε

I validated this analysis by changing the value χριβιορε to test and render the page. This rendered with statuscode 200. Reverting my change to χριβιορε re-introduced the problem. I did not update the hash column - it seemed to accept me changing the log_action field, without updating the hash.

I copied the string into notepad and reintroduced that into the database, the problem persisted. Then I ran that string through Babelstone and this returned the following output: U+03C7 : GREEK SMALL LETTER CHI U+03C1 : GREEK SMALL LETTER RHO U+03B9 : GREEK SMALL LETTER IOTA U+03B2 : GREEK SMALL LETTER BETA U+03B9 : GREEK SMALL LETTER IOTA U+03BF : GREEK SMALL LETTER OMICRON U+03C1 : GREEK SMALL LETTER RHO U+03B5 : GREEK SMALL LETTER EPSILON

the offending character is the RHO letter, but, I can't see why.

I finally tested my assumption by changing χριβιορε to χιβιοε - (removing the RHO letter). This resulted again in a working page. Inverting my changes so that the database only stored the RHO letter reintroduced the fatal error.

courtens commented 1 year ago

I am still getting this error.

\templates\_actionCommon.twig(25): The string to escape is not a valid UTF-8 string. 
[Query: ?date=2023-02-07,2023-02-08&module=Live&format=html&forceView=1&viewDataTable=VisitorLog&action=getLastVisitsDetails&small=1&idSite=2&period=range&segment=&widget=&showtitle=1&random=8716, CLI mode: 0]

Could this be the problem? Database Collation is set to utf8mb4_general_ci and config.ini.php is set to charset = "utf8mb4"

Matomo version: 4.13.3 MySQL version: 10.6.5-MariaDB PHP version: 8.1.10

The problem manifests itself as explained in this related post https://github.com/matomo-org/matomo/issues/19955#issue-1437436942

UPDATE: I updated my HTML code and the error went away. It must be somehow related the way matomo interprets the HTML page. The new code has

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

I am not sure if the old code had the above meta tag, but I do know that the old code was using the exact same script to connect to matomo.

bx80 commented 1 year ago

@Stan-vw This has been mentioned a few times now and may be a regression with how encoding is handled. Could we review the priority with a view to scheduling it in a future sprint?

atom-box commented 1 year ago

Our user had this error once yesterday and once earlier this year:

(Version + Date): 4.13.3 on February 14, 2023 The string to escape is not a valid UTF-8 string. in D:\www\piwik\plugins\Live\templates_actionCommon.twig 25 using PHP 8.1.10 4.15.1 on October 26, 2023. The string to escape is not a valid UTF-8 string. in D:\www\piwik\plugins\Live\templates_actionCommon.twig 25 using PHP 8.2.9

courtens commented 1 year ago

I am now also getting errors on other pages, and with other sites.

My tables are set to utf8mb4_unicode_ci and I changed utf8 to utf8mb4 in global.ini.php

global.ini.php
[database]
;  (was set to utf8 before)
charset = utf8mb4

The following error just broke Matomo (v4.15.1):

The string to escape is not a valid UTF-8 string.

in ...\plugins\Live\templates\getLastVisitsStart.twig line 76
in ...\plugins\Live\templates_actionCommon.twig line 25

Further troubleshooting

If this error continues to happen, you may be able to fix this issue by disabling one or more of the Third-Party plugins. If you don't know which plugin is causing this error, we recommend to first disable any plugin not created by "Matomo" and not created by "InnoCraft". You can enable plugin again afterwards in the Plugins or Themes page under settings at any time.

CustomVariables 4.1.3 deactivate
DevicePixelRatio 2.0.1 deactivate
MarketingCampaignsReporting 4.1.3 deactivate
ReferrersManager 4.0.4 deactivate
TreemapVisualization 4.0.2 deactivate

Further troubleshooting

If this error continues to happen, you may be able to fix this issue by disabling one or more of the Third-Party plugins. If you don't know which plugin is causing this error, we recommend to first disable any plugin not created by "Matomo" and not created by "InnoCraft". You can enable plugin again afterwards in the [Plugins] or [Themes] page under settings at any time. CustomVariables 4.1.3
DevicePixelRatio 2.0.1
MarketingCampaignsReporting 4.1.3
ReferrersManager 4.0.4
TreemapVisualization 4.0.2

If this error still occurs after disabling all plugins, you might want to consider [uninstall]ing some plugins. Keep in mind: The plugin will be completely removed from your platform. Provider [uninstall] VisitorGenerator uninstall

courtens commented 1 year ago

I would like to see this also flagged as an "Accessibility", a "Design / UI", and elevated to a "Bug". It should be a simple fix.

bx80 commented 1 year ago

The accessibility tag is for issues that affect usage of assistive technologies, such as screen readers, it doesn't apply here. The design / UI tag is for issues requiring a visual change to the UI, which isn't the case here.

It should be a simple fix.

This issue has now been scheduled into an upcoming sprint, but we always welcome pull requests.

bx80 commented 1 year ago

Related issue: https://github.com/matomo-org/matomo/issues/10083

courtens commented 11 months ago

still getting

The string to escape is not a valid UTF-8 string. in \plugins\Live\templates\getLastVisitsStart.twig 76 using PHP 8.2.9

Matomo version: 4.16.0 MySQL version: 10.6.15-MariaDB PHP version: 8.2.9

courtens commented 11 months ago

FYI, I am now on Matomo version: v5.0.0 and got the error

The string to escape is not a valid UTF-8 string. in D:... ...\plugins\Live\templates_actionCommon.twig line 25

on page /index.php?date=previous30&module=Live&format=html&forceView=1&viewDataTable=VisitorLog&action=getLastVisitsDetails&small=1&idSite=2&period=range&segment=&widget=&showtitle=1&random=7595

My plugins are: CustomVariables | 5.0.2 | deactivate DevicePixelRatio | 2.0.1 | deactivate MarketingCampaignsReporting | 5.0.2 | deactivate ReferrersManager | 5.0.1 | deactivate TreemapVisualization | 5.0.0 | deactivate