kissake / unattended_data_collection

This is a project intended to support unattended data collection in a way that protects the data being collected. The initial project is aimed at audio recording using a Raspberry Pi Zero (simple, low hanging fruit), with the intention of being extensible for other uses.
GNU General Public License v3.0
0 stars 0 forks source link
audio gnupg-2 gnupg2 raspberry-pi raspberry-pi-zero-2-w raspberrypi

Unattended Data Collection

This is a project intended to support unattended data collection in a way that protects the data being collected.

Specifically, there are two parts:

The initial project is aimed at audio recording using a Raspberry Pi Zero (simple, low hanging fruit), with the intention of being extensible for other uses.

Example

Example use case (functionality existing today):

Features

Features:

Motivation

I want to see if I am waking up in the middle of the night due to noises, or ...?

To do that, I need to record data (audio) in a private context (e.g. a bedroom) in such a way that that the audio can be reviewed at a later time, but:

I don't want my pillow-talk to fall into someone else's hands without my consent.

A future enhancement is to permit encryption such that all N persons who might have a privacy interest in the audio must consent.

This raises an interesting concept w/r/t privacy, FWIW. If we wanted to do this at huge scale, how might we do it?

Requirements

In this form, you'll want:

Installation or Getting Started

Get this data

You can get this software here from GitHub. Lots of tutorials on that part.

Install Raspberry Pi OS and make it accessible via ssh

I did the Raspberry OS install without beneift of HDMI, USB keyboard, etc. by following directions here: https://thedatafrog.com/en/articles/raspberry-pi-zero-headless-install/

unattended_data_collection setup

I created the public / private key pair using the setup.sh script. You should name them 'audio', or update the crontab file with the new name. It requires a password (a pair of passwords?) for the private key.

Install collection tools

I copied data to the Raspberry Pi 'pi' user's home directory using scp:

Install software dependencies

I installed the relevant software:

usbmount notes

You will probably (as of this writing) find that

systemctl show --property=PrivateMounts systemd-udevd.service

shows the output:

PrivateMounts=yes

In my experience, this means the USB drive won't correctly mount to /media/usb0/ when connected. Ideally, you could fix this with the command:

systemctl set-property systemd-udevd.service PrivateMounts=no

However, due to systemd, we can't have nice things (I may file a bug report, but I'm open to hearing why I'm wrong). Instead, you'll need to edit the file with the command:

sudo systemctl edit systemd-udevd.service

And add the following two lines near the top of the file, between the two relevant comments: "### Anything between here and "... and "### Lines below this "...

[Service]
PrivateMounts=no

Putting them below the second comment is an exercise in futility.

Activate the data collection

The active collection / function runs through cron. The recordings are kicked off every 10 minutes on the 0'th minute, and the tool to copy data to the USB drive is run on reboot and continues until it crashes.

Next, install the crontab as the 'pi' user:

crontab crontab

Make sure to check in on the usbmount notes below, and then reboot to activate the automatic exporting of data when you plug in the USB drive:

sudo reboot

Usage

Example use case (functionality existing today):

Caveats

In the situation where there is no wireless access, the clock on the system is not going to be updated using NTP. Instead it will only update using the normal builtin clock, along with the fake-hwtime feature (on the Raspberry Pi), which uses state recorded on disk to try to enforce monotonically increasing time.

Because of this platform's lack of many features we often take for granted, it is possible, if not likely, that 1) the system will spend significant amounts of time without power, and 2) that the system will not maintain an external system clock (e.g. using a CMOS battery or similar), and 3) that the system will likely be shut down uncleanly (preventing the update of fake-hwclock).

While the system will work to mitigate this in part by reporting the number of boots detected, along with the time relative to the most recent boot, in this situation it would be wise to take some step to record the current time at some point after recording has begun. One way to do this would be by speaking the current time within range of the microphone.

Reference

Contributors

Just me so far, but I'm open to help. I'm new to git / GitHub, so please be gentle.

I know there are a LOT of things to clean up, and I am not 100% on the cryptography part. I know it works in a few ways:

There are also lots of hardcoded things that need cleaning up.

License

This is being released under GNU General Public License version 3. License information to be updated in relevant files before long.

Of course, only my contributions are so licensed. I believe the files in this GitHub project are exclusively my own work with the exception of some crontab boilerplate.
I am making use of Debian as my base / baseline; this would have gone no-where without the great folks working on that project / in that community.