Anonymous usage statistics

blitzmann commented 5 years ago

Going to start the discussion / tracking on this.

One of the biggest pain points I have as a pyfa developer is simply not having information on how pyfa is being used. This includes what features are being used and how often, which makes it impossible to make informed decisions on whether or not we can deprecate a feature (XML/DNA import/export for example), or what to prioritize as we work on things.

I propose that we introduce anonymous usage statistic tracking into pyfa. This would be an opt-in feature, and would allow us to gather information on when actions are being done in pyfa. There's a very simple package that can help with this:

https://github.com/remram44/usagestats

Doesn't seem to be very popular, but usage tracking in python is itself not a common thing it seems. It has a very simple API that will allow us to track things in a key-value pair (so, for example, {'action': Export HTML'})

I would like to stress that it is anonymous. What would happen is that usagestats would generate a unique ID for the users installation, and that ID would be used to associate their actions for a session. The ID persists across sessions, but we wouldn't tie the ID back to any identifiable information (EVE character, for example). The data provided would simply allow us to gauge how people may be using pyfa so that we can better direct development.

Rarilmar commented 5 years ago

If pyfa starts sending data to some server we'll be banned in Goonswarm from using it so pls don't add this 'feature' lol

vertexpreacher commented 5 years ago

A very bad idea. Your app will surely be banned in some places.

DarkFenX commented 5 years ago

Will it be an issue even if it's "opt-in" (user gives explicit consent to send such data, and no data is sent by default)?

FYI this discussion between me and @blitzmann arose when I decided to remove XML and DNA fit-export-to-clipboard features (to avoid visual clutter, as set of options for EFT and multibuy formats were expanding). While he agreed, he still was unsure if these were being used or not - we have no usage statistics.

As part of #1887 i wanted to remove few more menu items (import fittings, export fittings, import character file). Usage statistics would be cool to have - otherwise we will have to do such changes somewhat blindly, making some assumptions about how users use the software.

ps Personally, I am completely fine with blindly removing stuff :P

blitzmann commented 5 years ago

This would be an opt-in feature,

So don't opt-in.

I want to stress that actual data about fittings or characters would obviously not be reported on. In fact, I can't think of a scenario where I would want that - I don't care for it. Thats why it's anonymous usages stats. I am solely interested in the following:

How many folks actually use pyfa (just for fun, can kind of get this info from number of downloads per release and the use of our SSO proxy)
Operating system metrics (how many people use OS X 10.10 vs the latest, or even how many people use mac client vs Windows).
What actions are taken within the application. I have no interest in data. This is more of a "how many people actually use the right click price panel > reset price cache, how many people actually export to XML files, or how many people use market drag to fit vs double click, and not "module x was dragging to ship y by character in alliance z"

I understand the need for opsec - it's very important, and the EVE community has always been touchy on these things. That's why in the past I've introduced things like toggles in the preferences to turn off even the most mundane network traffic such as update checks, price checks, etc to keep the more paranoid among us happy. Additionally, when I updated pyfa for EVE SSO, I implemented it in a way that would ensure the most security when it comes to the refresh tokens, in case a users fitting database was ever leaked by mistake. I also added ability for users to use their own developer credentials rather than route the traffic through pyfa.io for SSO (whose source is also open source, BTW, and would be the one to collect these usage stats if this feature is developed out). I'm aware of the EVE communities paranoia as it relates to these matters, but I always give folks a way to bypass them. In this situation, if the feature is developed, I envision when you first start up pyfa after the feature is release, you will be prompted to send these usage reports. A simple "No" click, and you're done, pyfa would work just like it always has.

But I think folks also need to realize that pyfa is a 9 year old program, with a bunch of old code in it that is hard to maintain, and keeping features updated that aren't being used / tracking down bugs related to features that aren't being used (like the infamous HTML export) take up a lot of development time. pyfa is and will always remain an open source program, and so the actions that we would report are all exposed; nothing is hidden. I think there is also probably a way to save usage reports on users machines so that one can examine what's being sent if desired.

When I used to play, I was never in a huge alliance that had restrictions on what I could install on my own computer, what websites I could visit, etc, so I may not be "getting" it, which is why I've opened this up for discussion. I'm assuming they also banned things like O.smium when it was around? Fleet-up? Basically any web-based program that isn't on their own servers?

Also, if anyone in any alliance leadership would like to discuss their concerns in private, I can always be contacted in pyfa's Slack channel (see README). That goes for anyone really. :)

And to clarify: this isn't a feature in development yet, and there's no time line for it. We haven't decided to commit to it yet. I am very much open to continued discussions. :)

vertexpreacher commented 5 years ago

We cannot trust ''opt-in'' if the scopes are already required before we push opt in. If you want to know about the operating system do a poll. Linux distr ubuntu 16.04 16gm ram etc...

@blitzmann I don't want to get into a wall of text about what i feel or believe a large alliance should do. I am just telling you how it is.

o.smium . fleet-up zkill etc are banned. if it tracks location, fleets, hangars etc its banned. Its not paranoia. they are out to get us.:P

DarkFenX commented 5 years ago

@vertexpreacher what do you mean by:

if the scopes are already required before we push opt in

?

Can you get one of your security guys to talk to us? I believe that's the only way to see what the concerns really are.

I am not strong proponent of a feature myself but i cannot see how adding it will change anything - as Pyfa already conducts some networking activity.

ghost commented 5 years ago

Personally I don't have a strong opinion, but I would like to suggest an alternative solution.

It could be possible to make a temporary release with telemetry tracking built in separate to the normal Pyfa release, this way users who want to provide developers with their usage data can do so without affecting mainstream users and tinfoil hat aficionados.

We can monitor the usage for a limited time on the separate release and then use that to inform development on the main branch. Once the data has been analysed the data tracking branch can be discontinued.

skyride commented 5 years ago

I don't see any alliance trying to ban their members from running a universally used and trusted piece of open source software on their own computer because the developers added opt in anonymous usage statistics if for no other reason than it'd be completely unenforceable. That seems like concern trolling at best.

Fits themselves are generally not too opsec either. Capitals and supercapitals have a limited number of optimal fits based on the usecase with only minor variation. Every other ship you can tell whos using what by just looking at zkill or with a very low effort spy. Knowing that someone somewhere is playing with a rail proteus fit is not useful information, its the levels of context that make them actionable e.g. $character in $alliance is playing a Naglfar fit named "$region dread bomb".

I've used EFT for years and so have most people I've flown with. I'd have no problem with what you're proposing.

vertexpreacher commented 5 years ago

@skyride its not a troll. That is all, your own opinion is your own.

skyride commented 5 years ago

@skyride its not a troll. That is all, your own opinion is your own.

Concern trolling is probably a bit strong, but you're making a spurious argument.

The services you listed are all web based and thus can lie about what they're doing. If you were storing fits on O.smium they could be pinched with exactly the kind of hypothetical context I proposed. Certain groups want to keep their APIs off zkill as they don't want to publicise their activities, and Fleet-up hands over a massive amount of sensitive information to a tool hosted on someone elses servers as part of its core functionality.

Pyfa meanwhile is open source, runs on your own computer so you can see exactly what it's doing if you care to look, and doesn't require ESI Scopes in capacity for its primary function of ship fitting. You specifically mentioned tracking fleets/hangars/locations which if you took 10 seconds to check you'll see Pyfa doesn't ask for. It only asks for read skills and read/write personal fittings 😛

vertexpreacher commented 5 years ago

OK, what will PYFA monitor? step by step.

DarkFenX commented 5 years ago

We have no comprehensive list of actions by now. It's just an idea and discussion for it. As for few examples - see 2nd post of @blitzmann in this thread.

blitzmann commented 5 years ago

@vertexpreacher

We cannot trust ''opt-in'' if the scopes are already required before we push opt in.

I'm also wondering about this. Nothing is required up front, except the package. When you first start up pyfa, it'll simply be a dialog that asks "do you want to do this thing?". If you click yes, then great, we'll collect user actions and then pyfa will upload either periodically /on close (unsure about when exactly, both have disadvantages, but they don't pertain to this discussion). If you click "no" then nothing happens. We'll still have to call the functions that register the actions, but nothing will be done with them, plain and simple.

Again, for reference, this is the package that I'm looking at. It has a pretty brief README on how it functions, please read it.

https://github.com/remram44/usagestats

I understand concerns about data collection. Really I do. What I don't understand is issues with opt-in usage reporting for a reputable open-source application that has been with the community for 9 years.

As for who would have access to said data: only core pyfa developers. That currently includes only @DarkFenX and myself. I for one haven't played the game in 4 years, so I'm not much invested in collecting any secrets. I still maintain and develop pyfa because I find it fun and I like serving the community.

@burnsypet had an interesting idea, but really, we're not going to maintain and release two separate code bases. If anything, we would have two different builds - one that simply prompts the user, and one that doesn't and assumes a "No" answer. But that's essentially the same thing as clicking "no" to begin with. :/

OK, what will PYFA monitor? step by step.

As @DarkFenX said, we haven't even gotten around to talking about this. 😄

but, off the top of my head (again, these are actions, not the underlying user data. When I say I want to log exports, I mean "EFT Export with x and y options selected", not "the EFT fitting"):

Main menu > pretty much everything that is clicked. We have no idea what's useful to community and whats not, or what can be organized better to showcase an otherwise obscure feature.
All Ctrl+C export formats and options
Obscure shortcuts (ctrl+space to switch from market to fittings)(ctrl+f to focus on current search bar)
How often is "Recent Used Modules" actually used
How often do people use "Hide Empty Ship Groups" (this would be very interesting, because implementing this feature actually slowed down performance at first)
How often do people drag and drop items to their respective panels
Basiucally asll the fitting context menu things (how often people switch skills through the menu, pick variations,"Fill with Module", market jump, ship jump, etc)
I want to know how many people actually disable the rack separators so I can finally remove the god awful code behind that steaming pile
How many people use the notes field in their fitting
What do people use in their "include in total price"
How many people use the "browse fittings" and "export fittings" (again, just want to know how prevalent the feature is, don't care about the fittings themselves)
How many people switch EHP and HP in resists panel
How many people use the race selector icons on the bottom of ship broswer
How often is "recent fits" enabled
How often do fit snapshots actually spawn
How often is the mining panel toggled

For those that are particularly concerned about what we actions we would log, GitHub has an amazing feature for code searching. Here's an example:

https://github.com/search?l=&q=GetFirstSelected%28%29+repo%3Apyfa-org%2Fpyfa&type=Code

If one wanted to audit the things that we report on, you could probably just search for stats.note( or whatever functions we end up using for registering an action. If this feature is developed out, documentation on things like this would be procured so folks would know what we collect, and can easily verify through source.

Sorry for wall of text, I hate that I always produce them 😄 I'm a bit verbose 😛

vertexpreacher commented 5 years ago

Hey! that is not so bad, its ok wall of text happens. Should be ok me thinks, thank you for the feedback.

stcktrce commented 5 years ago

Please ignore the trolls @blitzmann a opt-in should be more than enough, as @skyride pointed out. No alliance is going to tell their members to stop using a widely used and trusted open-source tool because it added opt-in usage tracking.

IndictionEve commented 5 years ago

Personally, I don't like it when big companies collect data. For small projects, where the data is not transmitted to third parties, it is ok for me. But I would recommend to collect this data as transparent as possible. A log containing the last 1000 send records could reduce the paranoia of most of the users. Those who still don't trust the log can compare the data with Wireshark, if they want.

blitzmann commented 5 years ago

@IndictionEve I'm generally in the same boat, tbh.

Wireshark probably wouldn't work unless you can get it to inspect HTTPS traffic (which I'm unsure if it can). But I was thinking that we could offer a way for users to see a record of all logs sent to us via a simple web app. The usage stats package does generate a unique ID for each installation, which we can then use to say "these are the stats you have sent to us, and this is the data that it contained".

I will hopefully be able to stand up a proof of concept in the next few weeks.

DarkFenX commented 5 years ago

Is it sensitive enough to require some kind of authentication mechanism? Because if it is, and we cannot come up with some decent one, the log way is probably better.

blitzmann commented 5 years ago

I think the unique ID is a hash, so would be incredibly difficult to randomly guess, but even if it did happen, we wont be tracking anything sensitive (in my opinion anyway). As stated, simple metrics and feature usage I would think :)

pyfa-org / Pyfa

Anonymous usage statistics #1898