matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.7k stars 2.62k forks source link

Using period=range, an empty list is returned if there is no data for one date in the date range #19460

Closed Situphen closed 4 months ago

Situphen commented 2 years ago

I am trying to get some statistics such as nb_visits for a specific label (corresponding to a URL) over a specific range of dates, but sometimes Matomo response is just an empty list. While investigating, I found that if there is no data for one date in the date range, then the response is an empty list.

Request with period=day

Request parameters:

/index.php?module=API&format=JSON&idSite=6&period=day&date=2022-06-28,2022-07-02&method=Actions.getPageUrls&label=tutoriels+>+3645+>+demontrer-par-labsurde

Current and correct response:

{
  "2022-06-28": [],
  "2022-06-29": [],
  "2022-06-30": [
    {
      "label": "demontrer-par-labsurde",
      "nb_visits": 8,
      ...
    }
  ],
  "2022-07-01": [
    {
      "label": "demontrer-par-labsurde",
      "nb_visits": 7,
      ...
    }
  ],
  "2022-07-02": [
    {
      "label": "demontrer-par-labsurde",
      "nb_visits": 3,
      ...
    }
  ]
}

Request with period=range

Request parameters:

/index.php?module=API&format=JSON&idSite=6&period=range&date=2022-06-28,2022-07-02&method=Actions.getPageUrls&label=tutoriels+>+3645+>+demontrer-par-labsurde

Current Behavior

Current response:

[]

Expected Behavior

Correct response:

[
  {
    "label": "demontrer-par-labsurde",
    "nb_visits": 18,
    ...
  }
]

Steps to Reproduce (for Bugs)

I guess any requests with method=Actions.getPageUrls, some correct label specified, period=range and date with a date range that as at least one date without data.

Context

See the first paragraph.

Your Environment

peterhashair commented 2 years ago

@Situphen thank you for reporting this, our product will review this.

justinvelluppillai commented 2 years ago

@peterhashair it'd be good to confirm we can reproduce this issue and check it's not a regression.

peterhashair commented 2 years ago

@Situphen trying to reproduce this, but it seems like it works on my local. I notice your label param is tutoriels+>+3645+>+demontrer-par-labsurde, and the actual response label is demontrer-par-labsurde. Can you try a request with label=demontrer-par-labsurde

Situphen commented 2 years ago

I tried a request with label=demontrer-par-labsurde as you suggested and I got an empty response : {"2022-06-28":[],"2022-06-29":[],"2022-06-30":[],"2022-07-01":[],"2022-07-02":[]} with period=day and [] with period=range.

peterhashair commented 2 years ago

@Situphen how about without label, and do a search on-page (ctrl+f) for demontrer maybe, just to confirm what the actual label is in the response, it could be caused by special characters.

Situphen commented 2 years ago

With period=day

Request parameters without label=... but with expanded=1:

/index.php?module=API&format=JSON&idSite=6&period=day&date=2022-06-28,2022-07-02&method=Actions.getPageUrls&expanded=1

Correct response (3 responses with the label demontrer-par-labsurde as before and 2 with the label /demontrer-par-labsurde.pdf when filtering with demontrer in Firefox) :

image

With period=range

Request parameters without label=... but with expanded=1:

/index.php?module=API&format=JSON&idSite=6&period=range&date=2022-06-28,2022-07-02&method=Actions.getPageUrls&expanded=1

Incorrect response (nothing when filtering with demontrer in Firefox):

image

peterhashair commented 2 years ago

@Situphen thanks for providing this, I will do more investigation on this

Situphen commented 2 years ago

As you were not able to reproduce this behavior, I investigated further and for some date range it works and for some other it doesn't. So I downloaded the data with period=day with a date range of 30 days (filename data_day_*.json) as well as the data with period=range for all possible date range combinaison (filename data_range_*.json) with a Python script (get_data_range.py). For each date range combinaison, I compared the nb_visits result I got from period=range to the sum of each daily nb_visits with another Python script (analysis.py). I sorted the date ranges in two lists: correct and incorrect (filename analysis_*.json). Everything is inside this zipfile below. I was not able to find a strict pattern common to all incorrect date ranges, but maybe you will be able to?

test-matomo.zip

peterhashair commented 2 years ago

@Situphen thank you very much for providing the additional info and scripts, it really helps, I will come back to you ASAP.

sgiehl commented 2 years ago

@Situphen I haven't had a look in detail, but that might be an issue of data truncation. When reports are archived, the aggregated data is limited to a certain amount of records. For actions the default is 500 for the base report and 100 for all subtables. Depending on the amount of page you are tracking it might happen that pages that are visited quite few are summarized into a Others row. So in theory a page can be visible on each day report, but if each day has varying pages tracked, a bigger period (e.g. week, month, year or range) might not contain a certain row as the report would have too many records.