Collecting netlab usage data

ipspace commented 3 days ago

It would be great to know how people use netlab; currently, we can only guess as we get little feedback and zero hard data.

The proposal to implement the usage data collection and eventual upload is in docs/roadmaps/usage.md. Feedback or PRs against that file are most welcome.

ssasso commented 3 days ago

Some very draft ideas:

Receiving and storing data can be achieved using cloudflare workers+kv or D1 storage, or with AWS Lambda+DynamoDB (plus putting some limits on it).

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

(Edit: if we need more resources for collecting and storing data we could apply for this? https://blog.cloudflare.com/expanding-our-support-for-oss-projects-with-project-alexandria )

DanPartelly commented 3 days ago

Ivan's proposed collection mechanism is in plain-text yml dictionary , so any user can actually see the data collected, and the upload is user triggered, so I guess this covers the issue.

I personally would be interested with what host OSes Netlab is used as well.

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

ipspace commented 3 days ago

I personally would be interested with what host OSes Netlab is used as well.

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

For example, uname -a produces a printout that someone might be able to deduce Ubuntu release from, but it's way beyond my capabilities. Anyway, according to this https://gist.github.com/natefoo/814c5bf936922dad97ff, the whole thing is a bit of a mess

ipspace commented 2 days ago

Receiving and storing data can be achieved using cloudflare workers+kv or D1 storage, or with AWS Lambda+DynamoDB (plus putting some limits on it).

These days I would definitely go with CF workers + KV/D1/R2

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

"The user could inspect the usage data with netlab usage show" ;) https://github.com/ipspace/netlab/blob/dev/docs/roadmap/usage.md?plain=1#L19

ssasso commented 2 days ago

Ok for the inspection of collected data, but seeing some "reporting stats" could be interesting imho

jbemmel commented 2 days ago

Ok for the inspection of collected data, but seeing some "reporting stats" could be interesting imho

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

DanPartelly commented 2 days ago

I like this, conceptually. There is nothing like letting the user watch the data.

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

DanPartelly commented 2 days ago

Sure, Ill look into it, and yes, you are right, this can be a can of worms. I had to fight it recently with cmake , their linux detection sucks so I had to overwrite the variables.

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

jbemmel commented 2 days ago

I like this, conceptually. There is nothing like letting the user watch the data.

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

Maybe we could even talk to GitHub and make this into an officially supported feature. Usage data for open source projects voluntarily provided by GitHub users would be a great addition - I think many projects would use that

DanPartelly commented 1 day ago

Perhaps the best way to determine the OS name without descending into madness is to use a systemd component, hostnamectl. It will return the correct distro name in its output. It will of course only work on systems using systemd but in 2024 all mainstream distros use it. Where it will fail are musl lib C based distros, which still use alternate init systems by necessity (Alpine, Void Linux, Chimera) and specialty distributions (embeded ... whatever).

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

ipspace / netlab

Collecting netlab usage data #1481