gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
30.35k stars 2.26k forks source link

Reduce the analytics that are collected in Gradio #8263

Closed abidlabs closed 1 month ago

abidlabs commented 1 month ago

Remove Google Analytics from Gradio apps and document the analytics that we do track from developers.

gradio-pr-bot commented 1 month ago

🪼 branch checks and previews

• Name Status URL
Spaces ready! Spaces preview
Website failed! Details
Storybook ready! Storybook preview
:unicorn: Changes detecting...

Install Gradio from this PR

pip install https://gradio-builds.s3.amazonaws.com/840b6bef9cd79a01d3e0ab887f0c3e6ae3379d5c/gradio-4.31.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@840b6bef9cd79a01d3e0ab887f0c3e6ae3379d5c#subdirectory=client/python"
gradio-pr-bot commented 1 month ago

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
@gradio/app patch
gradio patch

With the following changelog entry.

Reduce the analytics that are collected in Gradio

Maintainers or the PR author can modify the PR title to modify this entry.

#### Something isn't right? - Maintainers can change the version label to modify the version bump. - If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can [update the changelog file directly](https://github.com/gradio-app/gradio/edit/reduce-analytics/.changeset/shaggy-tables-fly.md).
freddyaboulton commented 1 month ago

Looks great @abidlabs !

RemiCardona commented 2 weeks ago

I'll add my own 2 euro-cents here.

First of all, this PR is a step in the right direction with at least some documentation of the tracking done by gradio and ways to mitigate that.

But the documentation is misleading. Simply importing gradio will trigger home-phoning HTTP calls (done on 4.31.0 which my team currently has pinned, but main still has the code):

$ python
Python 3.11.9 (main, May  4 2024, 11:48:10) [GCC 13.2.1 20240210] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.basicConfig(level="INFO")
>>> logging.getLogger("httpx").setLevel("INFO")
>>> import gradio
INFO:httpx:HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 200 OK"
>>> 

This is done here https://github.com/gradio-app/gradio/blob/b67f7ff8f9d2fe06b306c6d852e638d471e01565/gradio/strings.py#L39-L40 in top-level code, that is only controlled through the GRADIO_ANALYTICS_ENABLED environment variable.

This is incredibly poor taste (importing code should hardly ever do anything, let alone do HTTP calls) and a massive GDPR violation as consent was never given for such data collection.

IMHO gradio should come free of all external calls, the environment variable should simply not exist. There are many ways to collect insightful usage of gradio without compromising basic privacy: user surveys advertised through x/twitter, github, newsletters, etc.

The analytics code could be kept around but in a separate package, the way Debian does it:

Thanks for reading.