visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.54k stars 74 forks source link

[Feature Request]: Clarify telemetry and make it opt in #217

Closed talolard closed 1 year ago

talolard commented 1 year ago

Feature Name

Clarify Telemetry /Sentry

Feature Description

Hi, I've been reading through the code ahead of a POC. I noticed the calls to sentry and the init_sentry function as well as the SENTRY_OPT_OUT env var.

Collecting telemetry data is fine, however, I think it would go a long way towards ensuring trust and easing the adoption of fastdup if you communicated that that data is being collected and why, as well as providing a way to opt-in/opt-out at the program start.

I'd suggest any or all of the following:

  1. A telemetry section in the README stating you collect that data, what you collect, why and how to opt-out
  2. A similar (but shorter) warning at program startup
  3. (Most user-friendly / most work) Display such a warning, and wait for user input on whether they would like to send data or not.

Contact Information [Optional]

No response

dbickson commented 1 year ago

Hi @talolard thanks for your feedback, we are constantly improving by user feedback. The sentry section is found under the "Disclaimer" section on our github and it lists the following:

Disclaimer
Usage Tracking
We have added an experimental crash report collection, using [sentry.io](https://github.com/getsentry/). It does not collect user data other than anonymized IP address data, and it only logs fastdup library's own actions. We do NOT collect folder names, user names, image names, image content only aggregate performance statistics like total number of images, average runtime per image, total free memory, total free disk space, number of cores, etc. Collecting fastdup crashes will help us improve stability.

The code for the data collection is found [here](https://github.com/visual-layer/fastdup/blob/main/src/sentry.hpp). On MAC we use [Google crashpad](https://chromium.googlesource.com/crashpad/crashpad).

It is always possible to opt out of the experimental crash report collection via either of the following two options:

Define an environment variable called SENTRY_OPT_OUT
or run() with turi_param='run_sentry=0'

Let us know if this is not clear.

talolard commented 1 year ago

Thanks for the quick response @dbickson . I added a PR with some suggestions to make it clearer