Suggestion: merge server.py and privacy_server.py

svc-user commented 2 years ago

I wanted to give it a look to see what it would take to expose the collected data with some kind of RESTful API.

I took note that both server.py and privacy_server.py are very very similar, so may I suggest to merge privacy_server into server and control if privacy mode is enabled by a command line argument in the birdnet_server.service instead of replacing the script executed.

The main reason would be to avoid doing double development everytime there's a change to either of the scripts that's applicable to both.

I can do it, just don't want to spend the if there's no interest or any good counter arguments.

CaiusX commented 2 years ago

I think this is a great idea.

svc-user commented 2 years ago

I have forked the repo and have a branch made with the change. I haven't tested it yet, but will do later.

CaiusX commented 2 years ago

I'm happy to do some testing as well, create a pull request on GitHub when you're ready.

mcguirepr89 commented 2 years ago

This is a great idea. @CaiusX and I have discussed how to implement this change, and have landed on a method slightly different than the one you proposed.

Instead of having server.py implement privacy based on the presence of an argument/flag/parameter, we think it would be best to have a user-adjustable "Privacy Threshold."

The idea will be a slider in "Tools" > "Settings" > "Advanced Settings" that will adjust the "Privacy Threshold," allowing the number (calculated as a percentage of the labels.txt file [this takes into account short custom species lists]) to be adjusted from its default, 0, up to 50% of the species list. <---This is the proposed range while the actual range may differ based on what works.

The Privacy Threshold will be a new birdnet.conf variable server.py will use to adjust its sensitivity to human sounds.

Right now, enabling the "Privacy Mode" in effect sets the "Privacy Threshold" to 100, meaning if Human is predicted for the audio sample anywhere within the top 100 predictions, the prediction is labeled as "HUMAN" and no audio from that sample is retained.

The proposed shift would have the "Privacy Threshold" calculate a percentage of the labels.txt file. For example, to approximate the current "Privacy Mode" setting of 100, the proper "Privacy Threshold" would be ~1.6%. For practical purposes, the slider will likely increase the "Privacy Threshold" percentage value by 2%.

Example 1: PRIVACY_THRESHOLD=2% Number of species in labels.txt=6362 0.02 * 6362 = 127.24 species Rounded = 127 species

If a Human is predicted anywhere among the top 127 predictions, the sample will be considered of human origin and no data will be collected.

Example 2: PRIVACY_THRESHOLD=4% Number of species in labels.txt=6362 0.04 * 6362 = 254.48 Rounded = 254 species

If a Human is predicted anywhere among the top 254 predictions, the sample will be considered of human origin and no data will be collected.

Example 3, using a custom_species_list.txt: PRIVACY_THRESHOLD=50% Number of species in custom_species_list.txt=120 0.5 * 120 = 60 species

If a Human is predicted anywhere among the top 60 predictions, the sample will be considered of human origin and no data will be collected.

Hopefully that explanation isn't too complicated -- I don't feel I've done the best job explaining, but hopefully you'll see that the shift is from a binary choice (Privacy Mode enabled or disabled) to an adjustable threshold (Privacy Threshold values from 0% - 50%)

Thanks again for the great idea and observation

svc-user commented 2 years ago

Instead of having server.py implement privacy based on the presence of an argument/flag/parameter, we think it would be best to have a user-adjustable "Privacy Threshold."

This makes good sense. I imagine a scenarios where distant chatter might be inevitable because of a semi-urban location, but you still want very clear human speech filtered out - in which case tweaking the sensitivity makes sense.

The idea will be a slider in "Tools" > "Settings" > "Advanced Settings" that will adjust the "Privacy Threshold," allowing the number (calculated as a percentage of the labels.txt file [this takes into account short custom species lists]) to be adjusted from its default, 0, up to 50% of the species list. <---This is the proposed range while the actual range may differ based on what works.

The Privacy Threshold will be a new birdnet.conf variable server.py will use to adjust its sensitivity to human sounds.

Mhmm nods.

Right now, enabling the "Privacy Mode" in effect sets the "Privacy Threshold" to 100, meaning if Human is predicted for the audio sample anywhere within the top 100 predictions, the prediction is labeled as "HUMAN" and no audio from that sample is retained.

The proposed shift would have the "Privacy Threshold" calculate a percentage of the labels.txt file. For example, to approximate the current "Privacy Mode" setting of 100, the proper "Privacy Threshold" would be ~1.6%. For practical purposes, the slider will likely increase the "Privacy Threshold" percentage value by 2%.

Example 1: PRIVACY_THRESHOLD=2% Number of species in labels.txt=6362 0.02 * 6362 = 127.24 species Rounded = 127 species

If a Human is predicted anywhere among the top 127 predictions, the sample will be considered of human origin and no data will be collected.

Example 2: PRIVACY_THRESHOLD=4% Number of species in labels.txt=6362 0.04 * 6362 = 254.48 Rounded = 254 species

If a Human is predicted anywhere among the top 254 predictions, the sample will be considered of human origin and no data will be collected.

Great examples - makes sense. I've just had to look through the code again with above examples in mind and this makes very perfect sense. Looking at the predict method I have a rough idea of how this would be implemented.

Example 3, using a custom_species_list.txt: PRIVACY_THRESHOLD=50% Number of species in custom_species_list.txt=120 0.5 * 120 = 60 species

If a Human is predicted anywhere among the top 60 predictions, the sample will be considered of human origin and no data will be collected.

How the custom species list works is not immediately obvious to me, but I'll figure it out :)

Hopefully that explanation isn't too complicated -- I don't feel I've done the best job explaining, but hopefully you'll see that the shift is from a binary choice (Privacy Mode enabled or disabled) to an adjustable threshold (Privacy Threshold values from 0% - 50%)

Thanks again for the great idea and observation

I'm still up for the challenge. As originally this will still get rid of the privacy_server.py script and the filtering will happen in server.py.

While on it, I would assume the PRIVACY_MODE setting should be removed from birdnet.conf and the update_birdnet.sh script should take care of ensuring that the birdnet_server.service doesn't reference the privacy_server.py.

Should I base the changes on the master or forms branch?

ehpersonal38 commented 2 years ago

I also think it would be good to show each individual detection's "Human" prediction placement in some sort of debug mode, so if I see I have a detection with human voices in it, I can check the "Human" value of that detection and adjust my slider accordingly.

svc-user commented 2 years ago

Maybe for a start the human confidence could be logged in the database. I agree that it shouldn't be visible always, as it's probably not something you're interested in unless you're tweaking the setting.

mcguirepr89 commented 2 years ago

@ehpersonal38 & @bdoner

I also think it would be good to show each individual detection's "Human" prediction placement in some sort of debug mode, so if I see I have a detection with human voices in it, I can check the "Human" value of that detection and adjust my slider accordingly.

This is actually already in place -- you'll find a log of the human detections in the "HUMAN.txt" file in ~/BirdNET-Pi. You should be able to use that for debugging right away.

I'm still up for the challenge. As originally this will still get rid of the privacy_server.py script and the filtering will happen in server.py.

Great!

While on it, I would assume the PRIVACY_MODE setting should be removed from birdnet.conf and the update_birdnet.sh script should take care of ensuring that the birdnet_server.service doesn't reference the privacy_server.py.

You are correct, however, take a look at the most recent changes to update_birdnet.sh as it now calls a secondary script where we'll place system changes. I want to keep update_birdnet.sh as a file that only handles updating the repo with git, then update_birdnet_snippets.sh will handle other update tasks. This way, when updating, folks will always pull the latest update_birdnet_snippets.sh when they run update_birdnet.sh, so there will never be another scenario wherein the user will have to update twice to get updates.

SO, we'll want the changes to birdnet.conf to be in update_birdnet_snippets.sh.

We'll also need to update advanced.php to have a new slider and no longer have the binary choice.

We'll need to update birdnet.conf-defaults and install_config.sh to add the new variable during installation.

Then lastly, we'll get @CaiusX to have server.py read from birdnet.conf to get that PRIVACY_THRESHOLD variable.

Should I base the changes on the master or forms branch?

:) Neither of those are branches :) BUT, I'll make a branch from main that we'll use for this. You will be able to check it out by issuing:

git pull
git checkout server_merger

once I've created the branch --

You'll see it appear soon!

mcguirepr89 commented 2 years ago

Update

The server_merger branch now has everything in place to start accepting the PRIVACY_THRESHOLD variable (at least I think everything is in place).

Things left to do:

[ ] Have server.py read the PRIVACY_THRESHOLD variable -- @CaiusX
[x] Have advanced.php restart birdnet_server.service when the PRIVACY_THRESHOLD value is updated
[x] Add some info for how to use the new privacy settings in "Advanced Settings"
[ ] and possible documentation in the Wiki
[ ] Test a clean installation
[ ] Test an update using "Tools" > "System Controls" > "Update"

CaiusX commented 2 years ago

Hi @mcguirepr89 , @bdoner and @ehpersonal38

In order to not work on the same thing simultaneously, I will start:

Read PRIVACY THRESHOLD into server.py
Update Human Recognition section in server.py and do some cleaning up
Report Human detections to a Human table in the birds.db SQL database with Fields - Date, Time, Cutoff, Confidence, Privacy Threshold
At this stage I don't intend to save any human audio (nip privacy in the bud at the earliest opportunity) - but if anyone thinks there is a use case for this, we can consider including

A further thing I would love to see in this major update is weather data being populated automatically on an hourly basis. Not sure where the best place to do this is (@mcguirepr89 , you had it working in the BiredNET-Analyzer project, maybe its a simple inclusion in BirdNET-Pi?.)

Best Dennis

CaiusX commented 2 years ago

Hi @mcguirepr89

I have server_merger running on my main machine, when I update the Privacy Threshold under Advanced Settings, it just jumps back to 13%,

The thisrun text file is where I'd want to pick up the privacy setting, currently the field here is still PRIVACY_MODE=off, rather use this PRIVACY_SETTING= integer

Best

mcguirepr89 commented 2 years ago

Hiya!

There may be no PRIVACY_THRESHOLD= variable in your birdnet.conf.

run this:

if ! grep PRIVACY_THRESHOLD;then
  echo "PRIVACY_THRESHOLD=0" >> ~/BirdNET-Pi/birdnet.conf
fi

and see if that clears things up

svc-user commented 2 years ago

* At this stage I don't intend to save any human audio (nip privacy in the bud at the earliest opportunity) - but if anyone thinks there is a use case for this, we can consider including
There should not be saved any human audio if the privacy threshold has been reached. A lot of European countries, especially Germany have some very strict rules about data collection and privacy. Storing the audio that reaches the threshold all of a sudden makes BirdNET-Pi a human voice logger device 😅

mcguirepr89 commented 2 years ago

@CaiusX

A further thing I would love to see in this major update is weather data being populated automatically on an hourly basis. Not sure where the best place to do this is (@mcguirepr89 , you had it working in the BiredNET-Analyzer project, maybe its a simple inclusion in BirdNET-Pi?.)

Sorry I didn't see that before -- yeah it can just be taken from that repo just as it is I would assume. I'm happy to add that in, but I will probably do that as a separate branch and merge it just to keep changes a little modular in case anything needs to be reverted.

CaiusX commented 2 years ago

Cool,

CaiusX commented 2 years ago

Hey @mcguirepr89

I've got a server detecting humans running on my main system if you want to pop into it - couple of points

If I change the Privacy_Threshold in the web interface it seems to hang the system after update
I've had to scale the threshold number coming in dividing by 10 at this stage to give resolution on the lower end,
Still need to do the DB side but is logging on Human.txt

Going to watch Spinal Tap now, have a great day!!

Best AP

mcguirepr89 / BirdNET-Pi

Suggestion: merge server.py and privacy_server.py #240

Update