inventree / InvenTree

Open Source Inventory Management System
https://docs.inventree.org
MIT License
4.14k stars 746 forks source link

[FR] Add telemetry #4150

Open matmair opened 1 year ago

matmair commented 1 year ago

Please verify that this feature request has NOT been suggested before.

Problem statement

We currently neither know which version users are running nor on which systems and if plugins or other 'new' features are used.

Suggested solution

Implement user telemetry that reports back a few non-identifying instance metrics daily or weekly. Maybe also on plugin installs/activation. Must be Opt-In and all submitted data should be transparent - maybe a log of the last x events that were sent could be visible to superusers.

This should be discussed thoroughly before implementation - probably as a plugin. Has the potential to be controversial, especially with EU citizens.

Describe alternatives you've considered

We could try to use other gauging mechanisms like switching the update checker to a custom server and using the user-agent to get passive data. That feels a bit sneaky though.

Examples of other systems

I love the way octoprint does it: https://tracking.octoprint.org/. They expose the results to https://data.octoprint.org/ - which feels like a good idea in general. ELK stack + Grafana on a VPS might be a good solution.

Do you want to develop this?

matmair commented 1 year ago

@inventree/maintainer @inventree/triage @miggland @wolflu05 @SergeoLacruz @Bbillyben @martonmiklos @rkalman @Zontex thoughts? Please 👍 / 👎 if you think it is generally a good or bad idea.

Zontex commented 1 year ago

@matmair I think it's necessary and developers like myself will support it. Should be built in, disabled by default and maybe after installation ask one time if to enable or not

Bbillyben commented 1 year ago

Hi @matmair, This feature would be quite interesting, it's really frustrating to develop a tool and not having any clues of its use.

I'm EU citizen, and a great part of my job is tied/linked to GPDR and personnal data, so i'm totally biased and quite paranoid about collecting datas. It's freaking hard to be outside personnal data, and it tends to be harder and harder. With enough good sources, probalistic matching could lead to good identification results, and the efficience increase with the number of collected datas.

And I'm always sceptical when a server is collecting data from me and stating it's only for internal use.

SergeoLacruz commented 1 year ago

N'abbend, Thanks for asking my opinion. As you probably know I am a German boy. So like BbillyBen I have my special opinion on things like that :-) Anyhow I see the advantage for the developers. I think on huge projects with millions of users you cannot live without it because the usage scenarios are many and different. Very difficult to hunt bugs with little info. I know such systems from my company but I cannot judge if Inventree is already large enough to really need it. If we had enough resources and lots of developers go ahead. Probably interesting and fun for the guys who do this. On the other hand, if I look at the commits, Inventree is developed by very few people. The telemetry system has to be implemented. That's one thing. The data has to maintained somewhere and someone has to evaluate the results. Depending on the user base, this can be a lot of work. I am not a SW developer but a user of the system, a HW developer. IMHO there are other more important things on the agenda. For example the parameter search system is actually not doing anything useful. I have read that it will be reworked which is great. From the user point of view such and similar tasks are more important. I hope there are many more users than developers. Just my two cents.

Michael

matmair commented 1 year ago

There is a reason I pinged you guys @SergeoLacruz . As an Austrian with fondness for the Chaos-sphere I am conscious of the implications.

The problem is that we have a very hard time getting user feedback. We tried user surveys tied with release. Only 8% of the ppl that opened the link finished the 3-5 min survey. That is 30 responses in a month - there is not significant data with that amount. It is pretty much like flying in the dark without instruments. We can try to estimate from stars, docker pulls, GitHub visitors or package downloads but it is still very hard to even get a rough picture how many ppl. use the software. Maybe we are just 50 guys and gals with wrongly configured autoupdate.

I have build out this kind of infra at work and would feel comfortable building it. Analytics would consist of one public dashboard and a daily cleanup job. My goalpost would be something like OctoPrints approach - that seems to work fine for them.

I am honestly hoping someone has a less controversial idea how to get adoption and environment data. A switch to min python 3.10 (would enable using some great language improvements) is very hard in the current way. Last time a bunch of users were upset and unable to continue usage due to their OS.

martonmiklos commented 1 year ago

From my side: I am not yet using the InvenTree in any "production grade" environments, but as I foresee my needs I am likely going to be rolling with the master and fix/implement things continuously.

Telemetry wise I tend to turn it off in most of the software I am use (force of bad habits/stereotypes), however the projects where I am contributing are quite different I would say I have a different level of trust.

I understand the point why you need this feature and let me know if I can assist in anything to it. That's my two cents.

miggland commented 1 year ago

Interesting question, @matmair!

I think a system like this is implementable - but of course it will take work to make the policies, choices etc clear. Keeping this up to date in the documentation is also required. With the correct opt-in choices, privacy policies etc. I believe you can be GDPR-compliant.

On the other hand, the infrastructure will require up-keep as well. I believe you can set this up, but setting it up once is never enough. The more things you have to take care of, the more work it takes day-to-day just to keep things going..

If it's implemented, your suggestion of making it openly available makes sense.

Before collecting data though, I would ask the question about what it will be used for. Which type of data (specifically) is interesting, and which type of questions is the data going to inform? After you have a list (which may of course change once data is available and gives you new ideas..), you have a better way of searching for alternatives to answer them. It will also give you an idea of the value of such telemetry vs the costs.

As an example, your questions above:

Question Use Alternative
Which version are users running? Do we need to have release security patches for older versions? Pulls, stars. Communicate support for versions.
Are plugins used? How much work to invest in them? Work on plugins which have active support in issues/discussions/PRs..
... ... ...

I still don't quite see the value in the telemetry - there are more than enough bugs and FRs open to not have to search through data to be inspired on what to do. Focussing on the active input isn't such a bad way to go, in my opinion. Those who have an interest will simply have to come to Github, and join the discussion. Those who don't won't be able to shape the way InvenTree develops. As a minor contributor, I don't think any telemetry data would influence the types of issues I choose to work on. I'll continue to focus mostly on those that seem useful to myself. If there were an organisation or company behind Inventree, the picture would be quite different.

Just like @martonmiklos, I tend to turn these usage reports off whenever I encounter them.

matmair commented 1 year ago

@miggland if we want companies sponsoring employees to spend time on InvenTree some kind of MAU number would make the discussion a lot simpler.

Making decisions based on who shouts the loudest does not feel like a strategy to me. I am personally not really interested in developing features for a select few users - especially if they could program it themself. Keeping things in core that are not really used just leaves unnecessary breaking points.

Just by active support requests and tagged repos I would have not evolved/developed the plugin system.

miggland commented 1 year ago

Well then that should be part of the analysis - the determination of what to work on, and what is a motivation for this, is a central part of this question I think. For an organisation, it's obviously different to a single person. Your motivation may well be different to mine, which I don't think is a problem in any way.

Before you have a question clearly in mind, and an understanding of the value of answering those questions, I believe it's not possible to judge if making a telemetry system, which has some "costs" associated with it, is worthwile.

wolflu05 commented 1 year ago

Interesting approach. What kind of statistics and info do you get through that?

I have a script running that automatically scrapes the bug reports to get statics but they only work with text info.

Originally posted by @matmair in https://github.com/inventree/InvenTree/issues/4159#issuecomment-1372412931

matmair commented 1 year ago

Interesting approach. What kind of statistics and info do you get through that?

I have a script running that automatically scrapes the bug reports to get statics but they only work with text info.

Originally posted by @matmair in #4159 (comment)

Mainly currently used versions with a timestamp. I also measure engagement from non-team members to gauge interest. Makes interesting plots but nothing you can rely on. Shows mainly that a lot of users ignore updates. What can I say - an engineer gets inventive if he has no other options.

wolflu05 commented 1 year ago

Shows mainly that a lot of users ignore updates.

I can imagine this due to users set this up once and forget about updating the instance. They are just happy it's running and they can manage their inventory. But a bit strange, you got this info only from bug reports. This should tell the users they should update if they have bugs. I would more expect this result from the telemetry discussed here.

matmair commented 1 year ago

You would be amazed by the support requests for updating multi-year-old instances. Most bug reports lag back at least on major releases (so for example problems with 0.8.4 while 0.9.0 is released)

SergeoLacruz commented 1 year ago

Yes, If a SW system is used in a larger company or enterprise you canot follow each release. Switching to a new release of e.g. a CAD system creates a lot of work for the IT department. New revisions have to be evaluated, checked, need fo follow rulses, processes. Usually the tool in integrated in a design flow with various interfaces. Compatibility needs to be checked. Users need to be trained on new features.... endless list. So updates are limited,

But in case it is an open source system you cannot bother the developers with bugs in an old release that have already been fixed. This needs to be solved internally. We are on 0.8.4 :-)

Michael

matmair commented 3 months ago

I have researched the approaches by similar systems in the closed and open space in the last bit and compiled below outline. Will start working on the plugin soonish, the backend is already validated. Due to Prometheus push protocol usage you can use dozends of different backends for data recording / grouping.

Telemetry setup

Backend tech

Suggested metrics

All the metrics I would start with and why this section is interesting for InvenTree development / support considerations.

SchrodingersGat commented 3 months ago

Thanks @matmair this is very well considered.