tldr-pages / tldr

📚 Collaborative cheatsheets for console commands
https://tldr.sh
Other
51.09k stars 4.21k forks source link

Feature request: Save failed requests to a leaderboard #5904

Open qpwo opened 3 years ago

qpwo commented 3 years ago

When I run tldr doesnotexist, is that request logged anywhere? If not, it would be a great way to direct would-be authors.

My understanding is that tldr currently has no server or anything, and the files are all just pulled from github.

Then how can this be done without making a backend and dealing with all that? Ideas:

If people think this is a good idea, I think I could help with a PR. Sorry if this has been discussed before – I couldn't find anything with google searches.

jxu commented 3 years ago

Is this a good idea? I do not want tldr connecting to the internet every time I make a typo, especially without my permission.

bl-ue commented 3 years ago

Personally, I think it would be a very useful thing.

I do not want tldr connecting to the internet every time I make a typo

Agreed. Maybe we could accumulate a list of requests and then submit them in batch every now and then, like once a day?

especially without my permission.

Certainly not. There would be a configuration option, true by default, and the first time tldr is ran (or during installation or whatnot), it would ask the user if they'd like to send to send anonymous anayltics, which would include command ran, and then os osx, windows, linux, android, etc.

qpwo commented 3 years ago

the first time tldr is ran, it would ask the user if they'd like to send to send anonymous anayltics

An alternative is giving a "send vote y/n" prompt after a failure. Or "sending vote in 3..2..1.. Press space to cancel".

The problem as I see it is that this naturally leads to more ambitious feature requests, such as a way to add new entries from the terminal, or make pages for language libraries. And that could over-complicate / spread the project, which has a pretty clear scope rn.

bl-ue commented 3 years ago

I'd really like it to be a general analytics system — it would be very nice (if not useful...🤔) to see the usage of this project. We've been discussing it forever, but...

marchersimon commented 3 years ago

Especially for new contributors it would be really nice to see their pages being used. However, collecting and sending data would probably scare to many users off (Just look at how people reacted to the new telematics opt-in screen from Audacity).

I like the y/n prompt for sending requests, but I feel like that can be very anoying. Maybe a Run tldr --request command message, if tldr command has failed?

jxu commented 3 years ago

"sending vote in 3..2..1.. Press space to cancel" is auto opt-in and may be a violation of GDPR.

On Thu, May 13, 2021, 6:54 PM marchersimon @.***> wrote:

Especially for new contributors it would be really nice to see their pages being used. However, collecting and sending data would probably scare to many users off (Just look at how people reacted to the new telematics opt-in screen from Audacity).

I like the y/n prompt for sending requests, but I feel like that can be very anoying. Maybe a Run tldr --request command message, if tldr command has failed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tldr-pages/tldr/issues/5904#issuecomment-840880444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46VXT5ZL6VHV5EYTIF55TTNRKDZANCNFSM44IAG6WA .

sbrl commented 3 years ago

Hey there! Great suggestion here. Unfortunately, it's not particularly practical to implement, because tldr-pages has many community-developed clients. However, we do have the web client (https://tldr.ostera.io/), which has been discussed before (I can't remember where).

Additionally, on the useful scripts and programs wiki page there's a script I wrote called tldr-missing-pages that lists missing pages based on man pages and your shell history.

marchersimon commented 3 years ago

It would still be helpful if only the Node.js client would allow to do this. I don't assume users of different clients use different commands overall.

sbrl commented 3 years ago

In terms of the easiest place to implement this, I'd suggest that https://tldr.ostera.io/ might be the best candidate.

bl-ue commented 3 years ago

If we had a server (I assume you'd provision that @sbrl?), we could run a simple server using Node that took hits to a page, whether or not it was found, the ID of client (we'd give each client its own ID), and the OS its running on, and save it to a database. Pretty soon we'd have a lot of data, and we could visualize it easily.

It's just a must-have.

SethFalco commented 3 years ago

I do not want tldr connecting to the internet every time I make a typo, especially without my permission.

I have the same opinion.

Maybe we could accumulate a list of requests and then submit them in batch every now and then, like once a day?

I think something like this should be a result of explicit user action only. It seems a bit shady if it connects to another server at unspecified times.

In my opinion, what the node client does is great. If it's not found, it suggests that users can make a pull-request. This should probably also suggest that users can make a feature request (issue) as well.

Even better, just make a command and under the command not found response: You can submit a request for {} by doing: tldr --request {}

true by default

I think this capability could be cool, but I have a firm opinion on absolutely in no way making it true by default.

Why should it be true by default? If a user wants to contribute to the list of suggestions, they're welcome to opt-in for it. It is their choice.

Or "sending vote in 3..2..1.. Press space to cancel".

In my opinion, it shouldn't count down on the user at all, that stuff stresses users out. (3...2...1...) Defaulting to a choice that may not be in the user’s interest is already bad. Putting pressure on the user and getting them to rush to action is just worse. Deadlines are scary.

Key to that notion of expression is that it must reflect the user’s preference, not the preference of some institutional or network-imposed mechanism outside the user’s control. - https://www.w3.org/2011/tracking-protection/drafts/tracking-dnt.html#determining

Obviously, this isn't personal data, but in my opinion any data generated by a user, personal or not, should require consent to be shared.

anonymous anayltics, which would include command ran, and then os osx, windows, linux, android

The above is especially true if you're planning to store more information. The request alone will provide the IP (+ geolocation), user-agent (client/OS/version), etc. Things like this shouldn't be sent by default. Even if you won't use the IP or geolocation, it's still part of the request and being processed.

The server would probably just accept the suggestions as they come, so clients would be responsible for consent. So, I guess that would be responsible for how it's handled. However, I believe if tldr set up such a thing, there should be a requirement for appropriate use of the service and getting appropriate consent.

Sorry to be a bit of a downer here, but privacy is critical to me. It doesn't matter how many options or opportunities something provides to opt-out. If it's opt-out instead of opt-in, they don't care for privacy. Simple as that.

Edit: For clarity, this is just me dropping my overall perspective on the topic/discussion, so everyone knows what I agree or disagree with. I do recognize that others have expressed similar concerns already.

SethFalco commented 3 years ago

Oh wait, I'm stupid... ^-^'

but I feel like that can be very anoying. Maybe a Run tldr --request command message

I missed that one while I was skimming the discussion. ^-^' I can see @marchersimon already suggested a pretty solid solution. (same one I suggested in my comment)

bl-ue commented 3 years ago

Ah, though I read that note I didn't really give it attention. Sounds like a really cool idea to me.
So, if tldr ... doesn't work, it can say, run tldr --request "..." and then post my suggested info the to the server. 🎉

bl-ue commented 3 years ago

One thing too — if we could see from a reliable data source what pages are used the most, we could improve them if possible.

It would also be really encouraging to new users to see if people are using their tool.

If the clients/OSes were captured too we could improve those areas as well.

bl-ue commented 3 years ago

It seems a bit silly to worry about sending your entered command to an official server that we maintain (esp. after we say we do in the README which we will of course), when you tell someone that you're using tldr just by the fact that your client downloads the pages from somewhere. Indeed, clients that don't implement caching (such as https://tldr.ostera.io/) make requests directly to. the page on GitHub, thus half implementing my proposal right there.

(I'll stop talking about OS and client and only mention command for right now — the latter two are much less useful and more dangerous to capture.)

sbrl commented 3 years ago

Many sites already use Google Analytics. We can also respect the DNT header for example, and avoid sending metrics in that case. Finally, we could even explicitly prompt the user.

I recommend implementing just what page names are requested (only after the user stops typing for 5 seconds).

I do not believe that a tldr client is a good place to implement this, due to privacy concerns.

SethFalco commented 3 years ago

One thing too — if we could see from a reliable data source what pages are used the most, we could improve them if possible.

I think for that, you could just check the notability of something, or search GitHub under the topic CLI and sort by stars?

If the clients/OSes were captured too we could improve those areas as well.

Would it be feasible if a dataset will be produced like this, that it be released under an open data license. Assuming something like this doesn't exist already, and that the data is truly anonymous, it should be fine to do.

Then it can be used by other repositories or used for research.

It seems a bit silly to worry about sending your entered command to an official server that we maintain (esp. after we say we do in the README which we will of course), when you tell someone that you're using tldr just by the fact that your client downloads

I strongly disagree. There are two key differences:

  1. That is to read data, not write data.
  2. A reasonable person would say that is in the user's interest.

This doesn't go to say there's never a reason to make requests on the internet. However, I think anyone would agree that a user should be allowed to consent to it.

Checking for updates, be it a program, news feed, or whatever else is fine and within the users interest. Sending user-generated content to external servers is not.

Even Google gets explicit consent before any of their CLI tools send telemetry data anywhere.

SethFalco commented 3 years ago

If this is done, it's also important to make a privacy policy and include how users will be notified of changes to it or if they're expected to check it periodically. - https://matrix.to/#/!zXiOpjSkFTvtMpsenJ:gitter.im/$_6xIsnZL9oUFawTnCr0pKqyHQBQGvYxjTN9CCB5W54s?via=gitter.im&via=matrix.org&via=matrix.coredump.ch

bl-ue commented 3 years ago

Even Google gets explicit consent before any of their CLI tools send telemetry data anywhere.

Of course, I would never think to just start sending data without telling users. It should definitely follow the practices of products that do analytics, such as VS Code, and say, "do you consent to allowing anonymous statistical information to be stored?" blah blah blah.

sbrl commented 3 years ago

I'd suggest limiting the scope here to only the web client, and only after explicitly asking the user a very simple yes / no question (potentially even showing an example of the data we'd upload).

We also want to limit the data stored as much as possible to be only the thing the user typed (waiting a second or two to avoid capturing partial text).

I agree that learning about users type that they want would be valuable, but at the same time we need to be ethical about this.

marchersimon commented 3 years ago

I think we could start with the web client, but I think that won't give us the whole picture by far, since probaly only few are using it. It just makes sense to use the command line when viewing documentation for command line tools.

However, it could be a start and it's defenetively better than nothing.

MasterOdin commented 3 years ago

Agree where if the scope is limited to only ever being for the web client, it's probably not worth bothering as the amount of usage there is probably comically limited compared to the CLI clients. For this issue, I would probably suggest ignoring the web client for now in discussion of how this would work and focus just on the CLI clients since if there's no agreement on how to do it there, this discussion is dead on arrival.

For my part, I would say have the CLI clients just prompt the user for permission in some fashion like:

Share anonymous usage information to TLDR project? We collect client, OS, and command to help us understand ecosystem usage, sent only when you update your local cache. See for details on how we collect, store, and process this data.

y/n? [n]

and then user opts-in (or not). With regards to sending this info, just send it at the same time as requests for the updated cache are made, which is a network request the user should already be expecting to happen, so usage that doesn't currently generate network usage will not newly generate network usage. If a user never updates their cache, then the data never gets sent, so sad, but it's not like the TLDR project has fallen apart for lack of this data before now.

qpwo commented 3 years ago

Edit suggestion:

Share anonymous usage information to TLDR project? We collect client, OS, and command to help us understand ecosystem usage to identify commands that need new pages, sent only when you update your local cache. See for details on how we collect, store, and process this data.

bl-ue commented 3 years ago

I like the idea of sending the stats along with the update request. Are you thinking that in the meantime, before the user updates, we collect the infomation locally and batch send it when the user updates?

sbrl commented 3 years ago

Ah, I see. In that case, perhaps the Node.js or Python client would be a better place to start. I'd suggest we should draft up a privacy policy and a metrics aggregation server for this.

Batch sending is also a good idea. It should be done asynchronously though, so as to not block the displaying of the page the user wants to see.

Regardless of where we start, I do agree that this would be valuable information to have. So long as we have a comprehensive privacy policy including examples of uploaded data (I wonder if there's someone with legal background here who could proofread such a privacy policy?) and explicitly ask the user / have an opt-in system, it should be fine.

It's important to be transparent about it, and ensure it's opt-in / explicitly ask rather than being opt-out.

Edit: I can provide hosting for such a metrics aggregation server, just as I do for tldr-bot.

SethFalco commented 3 years ago

We could reference other privacy policies to help write our own. On GitHub, some companies have released their privacy policies under Creative Commons:

As one of the few people that check out Privacy Policies, I feel I could help with writing/proofreading it. However, I am in no way qualified to do so. (If we have someone with real legal background, that would be better.)