offspot / package-requests

Request additions to Offspot here!
GNU General Public License v3.0
0 stars 0 forks source link

Metric collection system #17

Closed Popolechien closed 1 year ago

Popolechien commented 4 years ago

Folks at SolarSPELL run a similar project to our hotspot and have implemented a basic usage tracking feature: mdseiler/SolarSPELL.

Could we easily implement it into our own code?

rgaudin commented 4 years ago

Also worth mentioning OLIP plan

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

Popolechien commented 2 years ago

Here's a sample output from SolarSPELL usagedata-10-22-2019 VU3.csv

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

rgaudin commented 1 year ago

We need specifications listing what you'd like so we can start a discussion on what's possible and how. We all know there are multiple ways around metrics:

letompouce commented 1 year ago

System metrics: OLIP uses a custom script to gather some system logs, however I'd like to leverage existing tools such as sosreport. We also have some basic battery monitoring script ; maybe we should also monitor the hotspot connectivity itself, and other things. By the past we built a quick PoC using telegraf but didn't pushed this path very far.

Web logs: OLIP implemented a GoAccess based report, which is basically a web logs stats system ; we added a poor's man app specific filtering but that's far from being "business metrics", for which the instrumentation should occur in OLIP (and/or Kiwix!).

At the end, we rsync all of this for a future analysis - which means such metrics/logs must be stored on the device until an internet connection allows upload.

About what business metrics would be needed, I feel we should rely on our MnE people expectations. Adding this to my todo list.

rgaudin commented 1 year ago

Thank you for this detailed feedback @letompouce. I understand OLIP users have access to those 4 reports on their box with some metrics (looked at the screenshots in the documentation) giving an idea of the general use.

I am curious to know if this data is already leveraging actions both at the deployment level (I doubt it would tbh) and at your central level. What use are you making of those system logs? Has it ever been useful for maintenance for example?

letompouce commented 1 year ago

It happened we were asked to provide for an overview of a device/project usage ; I think that's pretty much it about the web stats.

The system logs are quite useful for troubleshooting when the device is not connected permanently ; it helps to have some insights about the problem so we can act quickly once the device gets connected.

The logs are also useful when we wonder about global usage. For example, the overall time spent online vs offline ; the disk storage usage ; etc

kelson42 commented 1 year ago

By reading the comments, I see:

Therefore at this stage, the only thing we are basically sure is that:

Doing this (see point above) would be IMO the first step(s) to do.

kelson42 commented 1 year ago

Based on previous comment, and after discussion with @rgaudin, I continue my design work for a first milestone:

kelson42 commented 1 year ago

Here would be a base requirement in term of dashboard (form a user perspective):

@Popolechien @rgaudin Could we start with this?

rgaudin commented 1 year ago
kelson42 commented 1 year ago
  • I disagree with the global/filtered feature. There's no clear, immediate benefit from it so it's not for a first version. Expecting users to input regexp (on what field?) seems quite disconnected from who our users are.

I guess user wants to have detailed view based on app/content... not only a global view.

The regex system is only how it works. You can decide for the user what are the modules, or ask him to just put a URL prefix, or ask him a regex. The point is: it's a URL filter. That way you can make a module pro app (everything served via kiwix-serve instance) or you can make a module based on a special content (requests made to a specific zim file).

  • I understand modules are kind-of mini dashboards: a collection values/displays for specific KPIs.

It is exactly the same dashboard like the global one, just not based on the same data: made on a subset of data.

  • I think the time ranges should be flexible: you can use such shortcuts or select specific date(s)

It's impossible to make static dataset if you have flexible data range. Not worth it IMO.

  • I'm not fond of the visit duration. We'll probably don't get it with a light log ingester as described above and it's usually too approximate to be useful. Users are not technicians and I'm afraid they might trust this untrustworthy data.

I have no strong opinion on this. It is IMO a complex discussion, lets discuss this separatly.

  • More than anything, I think we need a list of content (packages) accessed with their number of visits and the distribution.

Good point! Might be a "global" only widget/chart (comparison between modules).

What is "the distribution"? Distribution over what exactly?

rgaudin commented 1 year ago

Ok I understand what you mean now. So with what you described we'd have only one module that is loaded with either the whole dataset (global) or a subset based on a prefix 👍

I believe flexible dates are very important. Periods mean having wanted data and unwanted (because of lack of precision) this noise and this uncertainty about the output. Our biggest users work around activities that are date bound and being able to assess them will be really useful. I don't think it will use much resources to do this dynamically but we should indeed choose one of those strategies before starting implementation.

Regarding distribution I meant over other packages/url. Numbers are useful but a pie chart showing that package X getting 30% of all traffic is direct, actionable information.

kelson42 commented 1 year ago

I don't think it will use much resources to do this dynamically but we should indeed choose one of those strategies before starting implementation.

We will have to discuss a bit that point then :)

One possible compromise approach might be to declare the event a-priori, and not a posteriori.

Regarding distribution I meant over other packages/url. Numbers are useful but a pie chart showing that package X getting 30% of all traffic is direct, actionable information.

Definitly agree!

Popolechien commented 1 year ago

Numbers are useful but a pie chart showing that package X getting 30% of all traffic is direct, actionable information.

I agree with this. At this (early) stage I would favour a dashboard that may be less customizable but that strongly emphasizes a visual display of whatever data is collected (pie charts, etc.).

rgaudin commented 1 year ago

Similar project https://datapost.site (using https://www.chartjs.org/ for nice charts)

Popolechien commented 1 year ago

re: datapost, I've reached out the folks at World Possible and here's what they had to say:

We haven't open-sourced the DataPost code, we're still sorting through our roadmap for the product really. We do plan to release an installer for other devices at some point in the future, there are just lots of considerations around how we would run DataPost on devices that don't have an internal battery or real-time clock we haven't started to tackle yet.

kelson42 commented 1 year ago

Nice... we will have to partly redo what they are doing.... looks like the FOSS spirit goes one way with them

Popolechien commented 1 year ago

Comment from user (Literate Earth, running schools in East Africa):

primarily we wanted to be able to track engagement with Wikipedia for Schools. I think this would be number of searches and number of clicks on hyperlinks.

benoit74 commented 1 year ago

I've began to assemble some implementation details here: https://github.com/offspot/container-images/tree/metrics/metrics

rgaudin commented 1 year ago

Superseded by https://github.com/offspot/metrics/issues/1. Closing this so we have a single discussion channel