kiwix / web

Bugs, enhancements, ideas for our Web presence
https://kiwix.org
6 stars 6 forks source link

stats.kiwix.org lacks granularity AND is too granular #216

Open Popolechien opened 1 year ago

Popolechien commented 1 year ago

Looking at stats for download.kiwix.org I can kind of surmise that around 12,000,000 zim files were downloaded over the past year.

The tool, however both fails to aggregate different versions of the same file (e.g. wikipedia_en_all_maxi_2022-05.zim and wikipedia_en_all_maxi_2023-05.zim) and does not show more than the top 500 rows.

We either need a better tool or make sure this one provides feedback that is actionable.

rgaudin commented 1 year ago

fails to aggregate different versions of the same file (e.g. wikipedia_en_all_maxi_2022-05.zim and wikipedia_en_all_maxi_2023-05.zim)

That's because those are two different files. Those files are different version of the same Book (CMS terminology). Only a custom tool could know that those are linked and should produce an aggregated counter.

I don't think messing with source logs is a good idea so you're probably left with creating/modifying a tool that works off matomo API/data and produce this. Might be a matomo extension of something separate.

Popolechien commented 1 year ago

To clarify, I'm trying to get the number of downloads for wikipedia_en_for_schools_maxi.zim (and the arabic version) over the past two years (1 August 2021 to 31 July 2023). Since the Zimfarm generates a new zim every month, I will also need the total to be calculated.

kelson42 commented 1 year ago

If you put "wikipedia_en_for_schools_maxi" as filter you should get your number. I see no result at all, so looks like either nobody has ever downloaded it in the last 12 months or we have somehow a bug or I don't understand how it work. Anyway, I just have downloaded it, so in one hour worse case it should be at least one download.

Popolechien commented 1 year ago

I've just checked and it did not pick it up as far as I can tell.

kelson42 commented 1 year ago

@rgaudin OK then it looks like a bug, either in rhe log hathering part or in matomo.

rgaudin commented 1 year ago

I am currently trying to find the record for this hit in the DB, if it's possible. Will be easier to know what to look at next then

rgaudin commented 1 year ago

I found the hit in matomo's DB so we can rule out a download log capture/upload issue.

Here's how I found it

SELECT * FROM piwik_log_visit WHERE idsite=2 AND location_country="ch" AND visit_first_action_time >= "2023-08-11 21:00:00" AND visit_last_action_time <= "2023-08-11 23:00:00"

There were several records. I identified @kelson with the location, time and OS which gave me idvisit=21264276

SELECT * FROM piwik_log_link_visit_action WHERE idvisit=21264276

There were a few results. I checked the URLs from idaction_url column with piwik_log_action table which told me the one we are looking for is idaction=9162997

SELECT * FROM piwik_log_action WHERE idaction=9162997;
idaction name hash type url_prefix
9162997 download.kiwix.org/zim/wikipedia/wikipedia_en_for_schools_maxi_2023-07.zim 4067615011 1 2

So the hit was recorded by matomo.

Out of curiosity (expensive query!)

SELECT COUNT(*) FROM piwik_log_link_visit_action WHERE idaction_url =9162997;
COUNT(*)
257

Not all rows in that table are individual downloads. There are many columns with not obvious names and there's this action concept that is mapped to other tables (and some stuff references one another). But there are records for that ZIM.


My opinion is that matomo is a complex tool and we (well you 😀) don't know exactly how to use it. I'd suggest you describe your use case in a matomo forum or support so we know exactly how to get that information you're looking for. Then, we may come back to a configuration issue in our instance.