matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.8k stars 2.64k forks source link

API call Contents.getContentNames randomly buggy #15628

Open feikede opened 4 years ago

feikede commented 4 years ago

This call randomly returns an empty [] json. I.e. currently:

/?method=Contents.getContentNames&label=anwalt%3A6122&date=2020-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e...384

returns [] with headers


Cache-Control | must-revalidate
-- | --
Connection | Keep-Alive
Content-Length | 2
Content-Type | application/json; charset=utf-8
Date | Thu, 27 Feb 2020 08:55:27 GMT
Keep-Alive | timeout=5, max=200
Server | Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.3.10
X-Matomo-Request-Id | 85f8e
X-Powered-By | PHP/7.3.10

While the exact same call with segmentation by label returns a (the correct) result: /?method=Contents.getContentNames&segment=contentName==anwalt%3A6122&label=anwalt%3A6122&date=2020-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e...


Cache-Control | must-revalidate
-- | --
Connection | Keep-Alive
Content-Length | 201
Content-Type | application/json; charset=utf-8
Date | Thu, 27 Feb 2020 08:46:51 GMT
Keep-Alive | timeout=5, max=200
Server | Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.3.10
X-Matomo-Request-Id | 5fd41
X-Powered-By | PHP/7.3.10

[{"label":"anwalt:6122","nb_visits":6,"nb_impressions":10,"nb_interactions":10,"sum_daily_nb_uniq_visitors":5,"interaction_rate":"100\u00a0%","segment":"contentName==anwalt%3A6122","idsubdatatable":1}]

Think this is a bug, or am I missing something?

BTW: This seems to be related to any kind of caching. Sometimes the call WITH segmentation delivers wrong results - sometimes without - and sometimes it's all gone one day later (have the cron running every hour, have about 10K req/day). Have latest V 3.13.3 installed.

tsteur commented 4 years ago

Random thought here.

I haven't tried to reproduce this. Looking at the code we would use the most recent invalidated archive though because of https://github.com/matomo-org/matomo/blob/3.13.3/core/DataAccess/Model.php#L243 and https://github.com/matomo-org/matomo/blob/3.13.3/core/DataAccess/ArchiveSelector.php#L128-L136 where we should use the first idArchive which is the most recent. Quickly looked into it and would say it almost can't be that.

Could be some other race condition as I've never heard of it before. Like not sure if at some point maybe a done flag is written to early for example.

@feikede do you know if you have browser archiving enabled? You can see this in Admin -> General Settings. Also for how long is the empty result returned if you know? Like are we talking few seconds here or few minutes? Thanks for mentioning the issue and your help 👍

feikede commented 4 years ago

@tsteur Thanks for your analysis. As for your questions:

Is it normal to have about 2500 tables in the database?

Maybe the matomo-archive.log helps?

INFO [2020-02-28 12:05:01] 26904  ---------------------------
INFO [2020-02-28 12:05:01] 26904  INIT
INFO [2020-02-28 12:05:01] 26904  Running Matomo 3.13.3 as Super User
INFO [2020-02-28 12:05:01] 26904  ---------------------------
INFO [2020-02-28 12:05:01] 26904  NOTES
INFO [2020-02-28 12:05:01] 26904  - Async process archiving supported, using CliMulti.
INFO [2020-02-28 12:05:01] 26904  - Reports for today will be processed at most every 900 seconds. You can change this value in Matomo UI > Settings > General Settings.
INFO [2020-02-28 12:05:01] 26904  - Reports for the current week/month/year will be requested at most every 3600 seconds.
INFO [2020-02-28 12:05:01] 26904  - Archiving was last executed without error 59 min 54s ago
INFO [2020-02-28 12:05:01] 26904  - Will invalidate archived reports for 2020-02-28 for following websites ids: 1
INFO [2020-02-28 12:05:01] 26904  ---------------------------
INFO [2020-02-28 12:05:01] 26904  START
INFO [2020-02-28 12:05:01] 26904  Starting Matomo reports archiving...
INFO [2020-02-28 12:05:01] 26904  - tracking data found for website id 1 since 2020-02-28 11:05:07 UTC (since the last successful archiving)
INFO [2020-02-28 12:05:01] 26904  Will pre-process for website id = 1, period = day, date = last1
INFO [2020-02-28 12:05:01] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:02] 26904  Archived website id = 1, period = day, 0 segments, 472 visits in last 2 days, 113 visits today, Time elapsed: 1.200s
INFO [2020-02-28 12:05:03] 26904  Will pre-process for website id = 1, period = week, date = last1
INFO [2020-02-28 12:05:03] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:04] 26904  Archived website id = 1, period = week, 0 segments, 0 visits in last 2 weeks, 0 visits this week, Time elapsed: 1.172s
INFO [2020-02-28 12:05:04] 26904  Will pre-process for website id = 1, period = month, date = last1
INFO [2020-02-28 12:05:04] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:05] 26904  Archived website id = 1, period = month, 0 segments, 0 visits in last 2 months, 0 visits this month, Time elapsed: 1.231s
INFO [2020-02-28 12:05:05] 26904  Will pre-process for website id = 1, period = year, date = last1
INFO [2020-02-28 12:05:05] 26904  - pre-processing all visits
INFO [2020-02-28 12:05:06] 26904  Archived website id = 1, period = year, 0 segments, 0 visits in last 2 years, 0 visits this year, Time elapsed: 1.354s
INFO [2020-02-28 12:05:06] 26904  Archived website id = 1, 4 API requests, Time elapsed: 4.972s [1/1 done]
INFO [2020-02-28 12:05:06] 26904  Done archiving!
INFO [2020-02-28 12:05:06] 26904  ---------------------------
INFO [2020-02-28 12:05:06] 26904  SUMMARY
INFO [2020-02-28 12:05:06] 26904  Total visits for today across archived websites: 113
INFO [2020-02-28 12:05:06] 26904  Archived today's reports for 1 websites
INFO [2020-02-28 12:05:06] 26904  Archived week/month/year for 1 websites
INFO [2020-02-28 12:05:06] 26904  Skipped 0 websites
INFO [2020-02-28 12:05:06] 26904  - 0 skipped because no new visit since the last script execution
INFO [2020-02-28 12:05:06] 26904  - 0 skipped because existing daily reports are less than 900 seconds old
INFO [2020-02-28 12:05:06] 26904  - 0 skipped because existing week/month/year periods reports are less than 3600 seconds old
INFO [2020-02-28 12:05:06] 26904  Total API requests: 4
INFO [2020-02-28 12:05:06] 26904  done: 1/1 100%, 113 vtoday, 1 wtoday, 1 wperiods, 4 req, 5060 ms, no error
INFO [2020-02-28 12:05:06] 26904  Time elapsed: 5.060s
INFO [2020-02-28 12:05:06] 26904  ---------------------------
INFO [2020-02-28 12:05:06] 26904  SCHEDULED TASKS
INFO [2020-02-28 12:05:06] 26904  Starting Scheduled tasks... 
INFO [2020-02-28 12:05:06] 26904  Scheduler: executing task Piwik\Plugins\CustomPiwikJs\Tasks.updateTracker...
INFO [2020-02-28 12:05:06] 26904  Scheduler: finished. Time elapsed: 0.001s
INFO [2020-02-28 12:05:06] 26904  Scheduler: executing task Piwik\Plugins\PrivacyManager\Tasks.deleteLogData...
INFO [2020-02-28 12:05:06] 26904  Scheduler: finished. Time elapsed: 0.001s
INFO [2020-02-28 12:05:06] 26904  Scheduler: executing task Piwik\Plugins\PrivacyManager\Tasks.anonymizePastData...
INFO [2020-02-28 12:05:06] 26904  Scheduler: finished. Time elapsed: 0.000s
INFO [2020-02-28 12:05:06] 26904  done
INFO [2020-02-28 12:05:06] 26904  ---------------------------

I'tt try to track down the next case in detail.

tsteur commented 4 years ago

2500 tables is not quite normal, is it maybe a rather old installation? Possible there was a bug a while ago.

Not available for few hours doesn't sound normal. Is in the same time the result not available in the UI either?

feikede commented 4 years ago

Ah, sorry, it's only 107 tables, I looked at the wrong number. And the result is available in the UI while it's not correct in the reporting API at the same time. I'll try to find another case today to track that down.

tsteur commented 4 years ago

Thanks @feikede . Is there maybe any caching active in your server set up or so? I'm asking because the UI is also just using the API and I can't think of a reason why the UI should be showing data while the API doesn't. Not sure what your server setup looks like or if your webserver sends maybe some caching headers.

feikede commented 4 years ago

No, there's no caching, it's a plain matomo LAMP install on CentOS 7.7. I have another case today (exactly now): /?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3

returns empty array [], while

/?method=Contents.getContentNames&segment=contentName==anwalt:2695&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3

returns

[{"label":"anwalt:2695","nb_visits":10,"nb_impressions":12,"nb_interactions":2,"sum_daily_nb_uniq_visitors":10,"interaction_rate":"16,67\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":1}]

and

/?method=Contents.getContentNames&segment=contentName==anwalt:2695&label=anwalt:2695&date=last30&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3

returns

{"2020-02-04":[],"2020-02-05":[],"2020-02-06":[],"2020-02-07":[],"2020-02-08":[],"2020-02-09":[],"2020-02-10":[],"2020-02-11":[],"2020-02-12":[],"2020-02-13":[],"2020-02-14":[],"2020-02-15":[],"2020-02-16":[],"2020-02-17":[],"2020-02-18":[],"2020-02-19":[],"2020-02-20":[],"2020-02-21":[],"2020-02-22":[],"2020-02-23":[],"2020-02-24":[],"2020-02-25":[],"2020-02-26":[],"2020-02-27":[],"2020-02-28":[],"2020-02-29":[],"2020-03-01":[],"2020-03-02":[],"2020-03-03":[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":1}],"2020-03-04":[{"label":"anwalt:2695","nb_uniq_visitors":1,"nb_visits":1,"nb_impressions":3,"nb_interactions":2,"interaction_rate":"66,67\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":1}]}

while

/?method=Contents.getContentNames&label=anwalt:2695&date=last30&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93..3 returns

{"2020-02-04":[],"2020-02-05":[],"2020-02-06":[],"2020-02-07":[],"2020-02-08":[],"2020-02-09":[],"2020-02-10":[],"2020-02-11":[],"2020-02-12":[],"2020-02-13":[],"2020-02-14":[],"2020-02-15":[],"2020-02-16":[],"2020-02-17":[],"2020-02-18":[],"2020-02-19":[],"2020-02-20":[],"2020-02-21":[],"2020-02-22":[],"2020-02-23":[],"2020-02-24":[],"2020-02-25":[],"2020-02-26":[],"2020-02-27":[],"2020-02-28":[],"2020-02-29":[],"2020-03-01":[],"2020-03-02":[],"2020-03-03":[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":39}],"2020-03-04":[]}

The UI shows 2 Impressions on 3.3.20 (4.3 is missing here, and no impr. before march) - and it's totally different from above's "label":"anwalt:2695","nb_visits":10,"nb_impressions":12

So in summary, this seems pretty useless. Do you know what's the suggested way to count content impressions and API-report them in matomo?

tsteur commented 4 years ago

@sgiehl @diosmosis maybe any idea right away how UI and API might report different results?

diosmosis commented 4 years ago

@feikede Does this only happen with dates that include today? Or does it happen w/ earlier dates? Does this affect the export link provided by the UI as well? Does it only affect ranges?

@tsteur I don't know why the UI would and API would report different results, seems very strange, but seems likely related to range archiving in some way.

feikede commented 4 years ago

Today /?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e3.. still returns [], matching access_log entry is 2.x.y.180 - - [05/Mar/2020:08:41:45 +0100] "GET /?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,today&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=yy HTTP/1.1" 200 2 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0"

@diosmosis /?method=Contents.getContentNames&label=anwalt:2695&date=2019-01-01,2020-03-04&period=range&format=JSON&module=API&idSite=1&format=JSON&token_auth=93.. also returns []

While /?method=Contents.getContentNames&label=anwalt:2695&date=last5&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e3.. returns {"2020-03-01":[],"2020-03-02":[],"2020-03-03":[{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":39}],"2020-03-04":[{"label":"anwalt:2695","nb_uniq_visitors":1,"nb_visits":1,"nb_impressions":3,"nb_interactions":2,"interaction_rate":"66,67\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":145}],"2020-03-05":[]} now.

@diosmosis How should I test the "export link" issue?

feikede commented 4 years ago

btw: /?method=Contents.getContentNames&label=anwalt:2695&date=2020-03-03&period=day&format=JSON&module=API&idSite=1&format=JSON&token_auth=93e returns [{"label":"anwalt:2695","nb_uniq_visitors":2,"nb_visits":2,"nb_impressions":2,"nb_interactions":0,"interaction_rate":"0\u00a0%","segment":"contentName==anwalt%3A2695","idsubdatatable":39}] maybe it's a period=range issue?

diosmosis commented 4 years ago

@feikede at the bottom of each graph there is an icon that can be used to get an API URL for a report:

image

clicking on it would show:

image

The link it generates should be directly to the API, I'm wondering if this feature is broken for you as well for content reports.

maybe it's a period=range issue?

It very well could be, though it's still extremely strange, and I'm not immediately sure what could be causing it.

feikede commented 4 years ago

Today, on 2020-03-31, I have the following buggy behavior:

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,today&period=range&format=JSON empty result

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,2020-03-30&period=range&format=JSON empty result

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,2020-03-29&period=range&format=JSON correct result ([{"label":"anwalt:6122","nb_visits":7,"nb_impressions":11,"nb_interactions":10,"sum_daily_nb_uniq_visitors":6,"interaction_rate":"90,91\u00a0%","segment":"contentName==anwalt%3A6122","idsubdatatable":844}])

/?method=Contents.getContentNames&label=anwalt:6122&date=2019-01-01,2020-03-28&period=range&format=JSON correct result ([{"label":"anwalt:6122","nb_visits":7,"nb_impressions":11,"nb_interactions":10,"sum_daily_nb_uniq_visitors":6,"interaction_rate":"90,91\u00a0%","segment":"contentName==anwalt%3A6122","idsubdatatable":844}])

etc.

I give up using this now, it's not reliable. Thanks for your help

tsteur commented 4 years ago

Sorry to hear @feikede . We would be maybe able to have a look should we have full server access (incl db) but otherwise this be hard to identify the issue. We can't possibly imagine how this is happening since Matomo uses the same API's in the reporting UI as well. It does somehow suggest there is some issue somewhere else in the server setup but of course there could be also just as well be some edge case issue in Matomo.

twi123 commented 4 years ago

Think i have similar issue with 3.14.1: /index.php?module=Contents&action=getContentNames&idSite=6&period=day&date=2020-09-30&segment=&showtitle=1&random=4231 and I got this:

Content Name Visits Unique visitors Actions Actions per Visit Avg. Time on Website Bounce Rate Conversion Rate
Вася опубликовал новость от 13.13.2013 50% 3 50% 3 - - 0s 0% 0%
Публикация 50% 3 50% 3 - - 0s 0% 0%
Totals 100% 6 100% 6 - - - - -

Note on empty "actions" column

But with this /index.php?module=API&method=Contents.getContentNames&idSite=6&period=day&date=2020-09-30&segment=&format=HTML&token_auth=... i got this:

label nb_uniq_visitors nb_visits nb_impressions nb_interactions interaction_rate _metadata _idSubtable
Вася опубликовал новость от 13.13.2013 3 3 6 11 183.33% 'segment' => 'contentName==%D0%92%D0%B0%D1%81%D1%8F+%D0%BE%D0%BF%D1%83%D0%B1%D0%BB%D0%B8%D0%BA%D0%BE%D0%B2%D0%B0%D0%BB+%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D1%8C+%D0%BE%D1%82+13.13.2013' 1
Публикация 3 3 5 6 120% 'segment' => 'contentName==%D0%9F%D1%83%D0%B1%D0%BB%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D1%8F' 2

Note on nb_interactions column (nevermind of too much interactions -- result of manual _paq.push('trackContentInteractions'...) calls via js console for testing purposes)

As I can see, a call method getContentNames via module Contents returns different results with calling method Contents.getContentNames from module API. Is it normal? Or I miss something?

twi123 commented 4 years ago

Or it's just something like a mess in column names in report UI (i.e. nb_actions instead of nb_interactions)?

tsteur commented 4 years ago

fyi replied in https://github.com/matomo-org/matomo/issues/16505