guysoft / RealtimePi

An out-of-the-box raspebrrypi/raspbian distro with a realtime kernel
GNU General Public License v3.0
163 stars 25 forks source link

Explain usage-statistics.service data collection purpose #21

Closed thisven closed 4 years ago

thisven commented 4 years ago

First of all thank you very much for maintaining a ready-to-use Raspbian image driven by a real-time kernel! I use it for a low latency audio application with JACK audio server and it works great.

While I was looking for unneeded services to disable, I found that there is a service called usage-statistics.service located in /etc/systemd/system/ which doesn't seem to be provided by Raspbian itself: [Unit] Description=Send anonymous statistics of a distro After=multi-user.target [Service] Type=oneshot RemainAfterExit=yes ExecStart=/usr/bin/boot_report [Install] WantedBy=multi-user.target

The service starts the attached script _/usr/bin/bootreport. boot_report.txt

It generates a unique id, gathers version and variant information to be send to: https://realtimepi-tracking.gnethomelinux.com/boot?id=ID_HERE&version=VERSION_HERE&variant=VARIANT_HERE which seems to me like fingerprinting and tracking devices using RealtimePi every 5 seconds.

Can you please explain why this is necessary and what do you do with that data? Another aspect that makes me sceptic, is that there's no information about tracking in project readme.

guysoft commented 4 years ago

Hey, this is a module that provides me with anonymous statistics on how many installations are there. The source is at: https://github.com/guysoft/CustomPiOS/blob/devel/src/modules/usage-statistics/filesystem/root/etc/systemd/system/usage-statistics.service

It runs once when the device boots and then terminates. Not sure where you got the 5 seconds thing, you can see its set as Type=oneshot as you pasted above. Also, the ID is not sent, a hash is sent so it would be obfuscated, to keep the data anonymous.

The output I get from the graph below. And I would be happy to share any other insights form this. I manage now over 7 different distros, and counting how many devices are out there is currently the only way I can prioritize which ones I should maintain (that and donations, which I don't really get). For example ElectricsheepPi was completely unmaintained, and then I got a single issue from someone asking for a new version, when I did that another 10 people downloaded it right away, people who never spoke up. Screenshot_20200312_012454

I will note that I am not hiding what this source does, its out that and 100% FOSS open source. its also placed it there because it tends to bring out people using my distros for commercial purposes who don't contribute back to the project. And its a good opportunity to encourage them to do so.

I hope this explains usage-statistics.service, closing. You are welcome to ask more questions.

thisven commented 4 years ago

Thanks for your quick reply. I think I've found the reason:

until $(curl --output /dev/null --silent --head --fail https://realtimepi-tracking.gnethomelinux.com/boot?id=`cat /proc/cpuinfo | grep -i '^Serial' | awk '{    print $3 }' | sha1sum | awk '{    print $1 }' `'&version='`cat /etc/realtimepi_version | head -n 1`'&variant='`cat /etc/dist_variant | head -n 1` ); do
    echo "$(date):  try to connect and report boot"
    sleep 5
done

The 1st line of the boot_report script will be repeated until it is successful. My installation is used offline, so 2nd and 3rd line is executed.

I can understand that there's need to check if a project is worth spending time on any more and find ways to get that information on a passive way, but I would encourage a more privacy-friendly way:

What's about integrating a onetime questionary on first boot asking the user if he consents to having his installation counted in order to keep this project maintained. Hints for donation would be possibe, too. By this data collection would (at least) be optional and maybe compliant to General Data Protection Regulation (GDPR). I'm talking about something like this: adventure_dialog-yesno

Another aspect concerning your presentation of this situation is that I'd really like to support RealtimePi, if I can. I'm rather a system administrator than a developer, but maybe there are other things I can do. I've tried compiling the real-time kernel for Raspberry Pi myself last year before this project integrated 4.19.x, but failed and I followed the issues for some time until I found new releases of RealtimePi comprising it. So I'm deeply grateful.

guysoft commented 4 years ago

I appreciate your attempt to add a solution. However as noted no private information is stored. That is why the Pi ID is obfuscated to a hash, I added that explicitly to be GDPR compliant. If you want to change it you are welcome to fork the project and maintain it, its FOSS. I would like to see that happening personally. I am not negotiating this. I can see you opened this account only to join github and ask me to remove the only tool I have to figure out if people want this project - If you want this so badly then contribute.