Closed svc-user closed 2 years ago
I think this is a great idea.
I have forked the repo and have a branch made with the change. I haven't tested it yet, but will do later.
I'm happy to do some testing as well, create a pull request on GitHub when you're ready.
This is a great idea. @CaiusX and I have discussed how to implement this change, and have landed on a method slightly different than the one you proposed.
Instead of having server.py
implement privacy based on the presence of an argument/flag/parameter, we think it would be best to have a user-adjustable "Privacy Threshold."
The idea will be a slider in "Tools" > "Settings" > "Advanced Settings" that will adjust the "Privacy Threshold," allowing the number (calculated as a percentage of the labels.txt
file [this takes into account short custom species lists]) to be adjusted from its default, 0, up to 50% of the species list. <---This is the proposed range while the actual range may differ based on what works.
The Privacy Threshold will be a new birdnet.conf
variable server.py
will use to adjust its sensitivity to human sounds.
Right now, enabling the "Privacy Mode" in effect sets the "Privacy Threshold" to 100, meaning if Human
is predicted for the audio sample anywhere within the top 100 predictions, the prediction is labeled as "HUMAN" and no audio from that sample is retained.
The proposed shift would have the "Privacy Threshold" calculate a percentage of the labels.txt
file. For example, to approximate the current "Privacy Mode" setting of 100, the proper "Privacy Threshold" would be ~1.6%. For practical purposes, the slider will likely increase the "Privacy Threshold" percentage value by 2%.
Example 1:
PRIVACY_THRESHOLD=2%
Number of species in labels.txt
=6362
0.02 * 6362 = 127.24 species
Rounded = 127 species
If a Human
is predicted anywhere among the top 127 predictions, the sample will be considered of human origin and no data will be collected.
Example 2:
PRIVACY_THRESHOLD=4%
Number of species in labels.txt
=6362
0.04 * 6362 = 254.48
Rounded = 254 species
If a Human
is predicted anywhere among the top 254 predictions, the sample will be considered of human origin and no data will be collected.
Example 3, using a custom_species_list.txt
:
PRIVACY_THRESHOLD=50%
Number of species in custom_species_list.txt
=120
0.5 * 120 = 60 species
If a Human
is predicted anywhere among the top 60 predictions, the sample will be considered of human origin and no data will be collected.
Hopefully that explanation isn't too complicated -- I don't feel I've done the best job explaining, but hopefully you'll see that the shift is from a binary choice (Privacy Mode enabled or disabled) to an adjustable threshold (Privacy Threshold values from 0% - 50%)
Thanks again for the great idea and observation
Instead of having
server.py
implement privacy based on the presence of an argument/flag/parameter, we think it would be best to have a user-adjustable "Privacy Threshold."This makes good sense. I imagine a scenarios where distant chatter might be inevitable because of a semi-urban location, but you still want very clear human speech filtered out - in which case tweaking the sensitivity makes sense.
The idea will be a slider in "Tools" > "Settings" > "Advanced Settings" that will adjust the "Privacy Threshold," allowing the number (calculated as a percentage of the
labels.txt
file [this takes into account short custom species lists]) to be adjusted from its default, 0, up to 50% of the species list. <---This is the proposed range while the actual range may differ based on what works.The Privacy Threshold will be a new
birdnet.conf
variableserver.py
will use to adjust its sensitivity to human sounds.Mhmm nods.
Right now, enabling the "Privacy Mode" in effect sets the "Privacy Threshold" to 100, meaning if
Human
is predicted for the audio sample anywhere within the top 100 predictions, the prediction is labeled as "HUMAN" and no audio from that sample is retained.The proposed shift would have the "Privacy Threshold" calculate a percentage of the
labels.txt
file. For example, to approximate the current "Privacy Mode" setting of 100, the proper "Privacy Threshold" would be ~1.6%. For practical purposes, the slider will likely increase the "Privacy Threshold" percentage value by 2%.Example 1: PRIVACY_THRESHOLD=2% Number of species in
labels.txt
=6362 0.02 * 6362 = 127.24 species Rounded = 127 speciesIf a
Human
is predicted anywhere among the top 127 predictions, the sample will be considered of human origin and no data will be collected.Example 2: PRIVACY_THRESHOLD=4% Number of species in
labels.txt
=6362 0.04 * 6362 = 254.48 Rounded = 254 speciesIf a
Human
is predicted anywhere among the top 254 predictions, the sample will be considered of human origin and no data will be collected.Great examples - makes sense. I've just had to look through the code again with above examples in mind and this makes very perfect sense. Looking at the
predict
method I have a rough idea of how this would be implemented.Example 3, using a
custom_species_list.txt
: PRIVACY_THRESHOLD=50% Number of species incustom_species_list.txt
=120 0.5 * 120 = 60 speciesIf a
Human
is predicted anywhere among the top 60 predictions, the sample will be considered of human origin and no data will be collected.How the custom species list works is not immediately obvious to me, but I'll figure it out :)
Hopefully that explanation isn't too complicated -- I don't feel I've done the best job explaining, but hopefully you'll see that the shift is from a binary choice (Privacy Mode enabled or disabled) to an adjustable threshold (Privacy Threshold values from 0% - 50%)
Thanks again for the great idea and observation
I'm still up for the challenge. As originally this will still get rid of the privacy_server.py
script and the filtering will happen in server.py
.
While on it, I would assume the PRIVACY_MODE setting should be removed from birdnet.conf
and the update_birdnet.sh
script should take care of ensuring that the birdnet_server.service
doesn't reference the privacy_server.py
.
Should I base the changes on the master
or forms
branch?
I also think it would be good to show each individual detection's "Human" prediction placement in some sort of debug mode, so if I see I have a detection with human voices in it, I can check the "Human" value of that detection and adjust my slider accordingly.
Maybe for a start the human confidence could be logged in the database. I agree that it shouldn't be visible always, as it's probably not something you're interested in unless you're tweaking the setting.
@ehpersonal38 & @bdoner
I also think it would be good to show each individual detection's "Human" prediction placement in some sort of debug mode, so if I see I have a detection with human voices in it, I can check the "Human" value of that detection and adjust my slider accordingly.
This is actually already in place -- you'll find a log of the human detections in the "HUMAN.txt" file in ~/BirdNET-Pi
. You should be able to use that for debugging right away.
I'm still up for the challenge. As originally this will still get rid of the privacy_server.py script and the filtering will happen in server.py.
Great!
While on it, I would assume the PRIVACY_MODE setting should be removed from birdnet.conf and the update_birdnet.sh script should take care of ensuring that the birdnet_server.service doesn't reference the privacy_server.py.
You are correct, however, take a look at the most recent changes to update_birdnet.sh
as it now calls a secondary script where we'll place system changes. I want to keep update_birdnet.sh
as a file that only handles updating the repo with git
, then update_birdnet_snippets.sh
will handle other update tasks. This way, when updating, folks will always pull the latest update_birdnet_snippets.sh
when they run update_birdnet.sh
, so there will never be another scenario wherein the user will have to update twice to get updates.
SO, we'll want the changes to birdnet.conf
to be in update_birdnet_snippets.sh
.
We'll also need to update advanced.php
to have a new slider and no longer have the binary choice.
We'll need to update birdnet.conf-defaults
and install_config.sh
to add the new variable during installation.
Then lastly, we'll get @CaiusX to have server.py
read from birdnet.conf
to get that PRIVACY_THRESHOLD
variable.
Should I base the changes on the master or forms branch?
:) Neither of those are branches :)
BUT, I'll make a branch from main
that we'll use for this. You will be able to check it out by issuing:
git pull
git checkout server_merger
once I've created the branch --
You'll see it appear soon!
The server_merger
branch now has everything in place to start accepting the PRIVACY_THRESHOLD
variable (at least I think everything is in place).
Things left to do:
server.py
read the PRIVACY_THRESHOLD
variable -- @CaiusX advanced.php
restart birdnet_server.service
when the PRIVACY_THRESHOLD
value is updatedHi @mcguirepr89 , @bdoner and @ehpersonal38
In order to not work on the same thing simultaneously, I will start:
A further thing I would love to see in this major update is weather data being populated automatically on an hourly basis. Not sure where the best place to do this is (@mcguirepr89 , you had it working in the BiredNET-Analyzer project, maybe its a simple inclusion in BirdNET-Pi?.)
Best Dennis
Hi @mcguirepr89
I have server_merger running on my main machine, when I update the Privacy Threshold under Advanced Settings, it just jumps back to 13%,
The thisrun text file is where I'd want to pick up the privacy setting, currently the field here is still PRIVACY_MODE=off, rather use this PRIVACY_SETTING= integer
Best
Hiya!
There may be no PRIVACY_THRESHOLD=
variable in your birdnet.conf.
run this:
if ! grep PRIVACY_THRESHOLD;then
echo "PRIVACY_THRESHOLD=0" >> ~/BirdNET-Pi/birdnet.conf
fi
and see if that clears things up
* At this stage I don't intend to save any human audio (nip privacy in the bud at the earliest opportunity) - but if anyone thinks there is a use case for this, we can consider including
There should not be saved any human audio if the privacy threshold has been reached. A lot of European countries, especially Germany have some very strict rules about data collection and privacy. Storing the audio that reaches the threshold all of a sudden makes BirdNET-Pi a human voice logger device 😅
@CaiusX
A further thing I would love to see in this major update is weather data being populated automatically on an hourly basis. Not sure where the best place to do this is (@mcguirepr89 , you had it working in the BiredNET-Analyzer project, maybe its a simple inclusion in BirdNET-Pi?.)
Sorry I didn't see that before -- yeah it can just be taken from that repo just as it is I would assume. I'm happy to add that in, but I will probably do that as a separate branch and merge it just to keep changes a little modular in case anything needs to be reverted.
Cool,
Hey @mcguirepr89
I've got a server detecting humans running on my main system if you want to pop into it - couple of points
If I change the Privacy_Threshold in the web interface it seems to hang the system after update
I've had to scale the threshold number coming in dividing by 10 at this stage to give resolution on the lower end,
Still need to do the DB side but is logging on Human.txt
Going to watch Spinal Tap now, have a great day!!
Best AP
I wanted to give it a look to see what it would take to expose the collected data with some kind of RESTful API.
I took note that both
server.py
andprivacy_server.py
are very very similar, so may I suggest to merge privacy_server into server and control if privacy mode is enabled by a command line argument in thebirdnet_server.service
instead of replacing the script executed.The main reason would be to avoid doing double development everytime there's a change to either of the scripts that's applicable to both.
I can do it, just don't want to spend the if there's no interest or any good counter arguments.