mcguirepr89 / BirdNET-Pi

A realtime acoustic bird classification system for the Raspberry Pi 4B, 3B+, and 0W2 built on the TFLite version of BirdNET.
Other
1.28k stars 137 forks source link

Experimenting with RPi Zero 2W #135

Closed DD4WH closed 2 years ago

DD4WH commented 2 years ago

Patrick, thanks again for the accurate installation guide for RPi Zero 2W. HERE I open this issue to have a place specifically for feedback on how BirdNET-Pi v0.11 (forms branch) is performing on this hardware. Feel free to close this issue or move it to another -more appropriate- place :-).

These are my experiences running BirdNET-Pi for half a day now (changed overlap to 0.5 and min conf to 0.5, language to german, set appropriate URL IP adress to successfully connect via Android phone):

A serious issue is that after running for one hour or a bit longer, BirdNET-Pi seems to hang and no files are being processed any more. The log shows this:

Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/bin/server.py", line 315, in handle_client
    writeResultsToFile(detections, min_conf, args.o)
  File "/usr/local/bin/server.py", line 204, in writeResultsToFile
    with open(path, 'w') as rfile:
FileNotFoundError: [Errno 2] No such file or directory: '/home/pi/BirdSongs/February-2022/08-Tuesday/2022-02-08-birdnet-23:57:08.wav.csv

I experienced this three times yesterday and this night. So it seems it is a reproducable issue.

After rebooting, log and analysis still seem to be stuck (the date and time of the file still shows a very old file):

with open(path, 'w') as rfile:
FileNotFoundError: [Errno 2] No such file or directory: '/home/pi/BirdSongs/February-2022/08-Tuesday/2022-02-08-birdnet-23:57:08.wav.csv
birdnet_server.service: Main process exited, code=killed, status=9/KILL
birdnet_server.service: Failed with result 'signal'.
Stopped BirdNET Analysis Server.
birdnet_server.service: Consumed 18min 48.826s CPU time.
Started BirdNET Analysis Server.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Created TensorFlow Lite delegate for select TF ops.
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 182 nodes with 1 partitions.

After RESTART ALL SERVICES it seems it has resumed analysis and shows the usual output, which confirms it really takes the real time recordings and processes without lag:

 ('Perdix perdix_Rebhuhn', 0.049771078)
 ('Perdix perdix_Rebhuhn', 0.0641239)
 ('Perdix perdix_Rebhuhn', 0.04529362)
 ('Perdix perdix_Rebhuhn', 0.051011525)
[NEW CONNECTION] ('127.0.0.1', 33360) connected.
[ACTIVE CONNECTIONS] 1
READING AUDIO DATA... DONE! READ 12 CHUNKS.
FULL FILENAME: -/home/pi/BirdSongs/February-2022/09-Wednesday/2022-02-09-birdnet-08:50:33.wav-
Date-time: 2022-02-09 08:50:33

Hope this helps in finding the issue. Keep up the excellent work!

mcguirepr89 commented 2 years ago

A serious issue is that after running for one hour or a bit longer, BirdNET-Pi seems to hang and no files are being processed any more. The log shows this:

Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/bin/server.py", line 315, in handle_client
writeResultsToFile(detections, min_conf, args.o)
File "/usr/local/bin/server.py", line 204, in writeResultsToFile
with open(path, 'w') as rfile:
FileNotFoundError: [Errno 2] No such file or directory: '/home/pi/BirdSongs/February-2022/08-Tuesday/2022-02-08-birdnet-23:57:08.wav.csv

I experienced this three times yesterday and this night. So it seems it is a reproducable issue.

The error in log excerpt above is indicating that there is something wrong with the birdnet_recording.service. The log excerpt ends the traceback with the FileNotFoundError, which means the birdnet_recording.service stopped for some reason. The next log excerpt supports this conclusion, as the "Restart ALL Services" script restarts the birdnet_recording.service and the FileNotFound error disappears.

After rebooting, log and analysis still seem to be stuck (the date and time of the file still shows a very old file):

with open(path, 'w') as rfile:
FileNotFoundError: [Errno 2] No such file or directory: '/home/pi/BirdSongs/February-2022/08-Tuesday/2022-02-08-birdnet-23:57:08.wav.csv
birdnet_server.service: Main process exited, code=killed, status=9/KILL
birdnet_server.service: Failed with result 'signal'.
Stopped BirdNET Analysis Server.
birdnet_server.service: Consumed 18min 48.826s CPU time.
Started BirdNET Analysis Server.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Created TensorFlow Lite delegate for select TF ops.
INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 182 nodes with 1 partitions.

If you checked these logs right after the reboot, then the portion at the top of the log excerpt above is reporting on the last thing the service did as the system was shutting down. Basically, after any reboot, if you check logs very soon thereafter (for any service on the system), you'll see what it was reporting as it was shutting down for the reboot. I can't say for certain that that is what is indicated in the log excerpt above, but it would explain the old timestamp and the successful service behavior that followed.

After RESTART ALL SERVICES it seems it has resumed analysis and shows the usual output, which confirms it really takes the real time recordings and processes without lag:

('Perdix perdix_Rebhuhn', 0.049771078)
('Perdix perdix_Rebhuhn', 0.0641239)
('Perdix perdix_Rebhuhn', 0.04529362)
('Perdix perdix_Rebhuhn', 0.051011525)
[NEW CONNECTION] ('127.0.0.1', 33360) connected.
[ACTIVE CONNECTIONS] 1
READING AUDIO DATA... DONE! READ 12 CHUNKS.
FULL FILENAME: -/home/pi/BirdSongs/February-2022/09-Wednesday/2022-02-09-birdnet-08:50:33.wav-
Date-time: 2022-02-09 08:50:33

Keeping with my original supposition, the "Restart ALL Services" script worked again in this scenario because the issues seemed to be in birdnet_recording.service getting stuck.

The next time you see the FileNotFound error, try restarting the recording service ("System" > "Tools" > "Manage Services" > "Restart Recording").

And if you have the ability to access the terminal when things are malfunctioning, could you please send the output of:

journalctl -n50 --no-pager --no-hostname -u birdnet_recording.service

Thank you for providing such great information on this issue (as I have come to expect from you :smile:) -- it is truly necessary when trying to troubleshoot that I have as much information as possible, and the more log out put I can get my hands on, the better, so thanks!

I've stopped testing the RPi0W2 for today so that I can work on other things (still only have one microphone!) so let me know what you observer as you observer it.

DD4WH commented 2 years ago

Surprisingly, the RPi Zero 2W has had no more issues since the reboot this morning . . . for 6 hours 50min now . . . Will report as soon as I recognize something strange :-).

mcguirepr89 commented 2 years ago

I love it! Thanks for keeping me posted!

DD4WH commented 2 years ago

I have ordered thermal glue (my old glue is hard as concrete by now :-) for a heatsink to be glued on the Zero CPU. It gets really hot:

grafik

mcguirepr89 commented 2 years ago

Yes I've noticed that right when you start things up, it performs very well, then the heat sets in and everything starts to get sluggish. The heatsink will be great. I wonder if there would be a way to thermally attach your heatsink to your outdoor metal enclosure? And if that enclosure were aluminum, perhaps even adhere the CPU itself to the enclosure with thermal paste? Just a tinkerer's thought.

DD4WH commented 2 years ago

Hmm, maybe it is the microUSB cable?

grafik

Does that mean it is undervoltaged ?

mcguirepr89 commented 2 years ago

Nope 0x0 means everything is running perfectly well -- no throttling no voltage issues and no CPU limits

DD4WH commented 2 years ago

Ah, yes, thanks! Pretty clear! 0x0 means that bit 0 is NOT set . . . [bit 0 seems to be the indicator for undervoltage as far as I remember]

mcguirepr89 commented 2 years ago

You just gave me a good idea -- I'm going to add vcgencmd get_throttled to the network_info.sh script that displays output on the "System" > "Info" page. Maybe I'll change network_info.sh to just info.sh and add more system info there.

DD4WH commented 2 years ago

Yes, that would be perfect to have that kind of indicator on the System info page!

mcguirepr89 commented 2 years ago

Ah, yes, thanks! Pretty clear! 0x0 means that bit 0 is NOT set . . . [bit 0 seems to be the indicator for undervoltage as far as I remember]

A 0 bit may indeed indicate an undervoltage error, but these are not bits -- this is a Hex code. The bit translation for 0 (you only use the second half of the hex for the bit indication) is:

0000

Then you need to identify which bit represents which monitored state:

From the vcgencmd manual page:

       get_throttled
              Returns the throttled state of the system. This is a bit pattern - a bit being set indicates the following
              meanings:

              Bit   Meaning
              ────  ────────────────────────────────────
               0    Under-voltage detected
               1    Arm frequency capped
               2    Currently throttled
               3    Soft temperature limit active
              16    Under-voltage has occurred
              17    Arm frequency capping has occurred
              18    Throttling has occurred
              19    Soft temperature limit has occurred

              A value of zero indicates that none of the above conditions is true.

              To find if one of these bits has been set, convert the value returned to  binary,  then  number  each  bit
              along the top. You can then see which bits are set. For example:

              0x50000 = 0101 0000 0000 0000 0000

              Adding the bit numbers along the top we get:

              19   18   17   16   15   14   13   12   11   10   9   8   7   6   5   4   3   2   1
               0    1    0    1    0    0    0    0    0    0   0   0   0   0   0   0   0   0   0

              From this we can see that bits 18 and 16 are set, indicating that the Pi has previously been throttled due
              to under-voltage, but is not currently throttled for any reason.

Then that means:

00000000000000000000
||||            ||||_ Under-voltage detected
||||            |||_ Arm frequency capped
||||            ||_ Currently throttled
||||            |_ Soft temperature limit active
||||_ Under-voltage has occurred since last reboot
|||_ Arm frequency capped has occurred
||_ Throttling has occurred
|_ Soft temperature limit has occurred

Notice, none of the bits are "on", so your system is in tip top shape!

I just realized I put all that together because I misread your last comment :) oh well, overkill I suppose

mcguirepr89 commented 2 years ago

For some reason, I'm seeing the birdnet_recording.service stop on my installation for no apparent reason :( this is no good!

mcguirepr89 commented 2 years ago

I FOUND THE BUG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! in birdnet_analysis.sh

 19 ...
 20 make_thisrun &> /dev/null
 21 if ! diff ${LAST_RUN} ${THIS_RUN};then
 22   echo "The birdnet.conf file has changed"
 23   echo "Reloading services"
 24   cat ${THIS_RUN} > ${LAST_RUN}
 25   sudo systemctl stop birdnet_recording.service
 26   sudo rm -rf ${RECS_DIR}/$(date +%B-%Y/%d-%A)/*
 27 fi
 28 ...

I forgot to restart the birdnet_recording.service!!!!!!!!!!!!

mcguirepr89 commented 2 years ago

I pushed the fix to forms and am going to pull it into main now since it is a true bug

mcguirepr89 commented 2 years ago

I'm going to keep this open while we continue to test, but I'm pretty confident that was the reason for all of this trouble

DD4WH commented 2 years ago

Sounds really good Will do an update and report.

mcguirepr89 commented 2 years ago

heads up, the update takes a long time on the RPi0W2 -- if I were you, I would run the update via PuTTY SSH update_birdnet.sh to watch and see the update (for testing purposes) in case anything weird happens. It isn't necessary that you run the update that way, but if you do use the Web GUI for the update, allow it a few more minutes than the default 60 seconds the web page suggests.

DD4WH commented 2 years ago

Thanks! Thats a good hint, probably one of the reasons why I experienced some strange effects last evening ;-).

DD4WH commented 2 years ago

BTW, by now even the Raspberry Pi Zero 2W is unobtainable here in Germany . . .

DD4WH commented 2 years ago

Hmm, it really takes a long time to update. The Pi is now working and trying to catch up with the recordings that have been made, but not yet analyzed. The time for the analysis of a 30seconds recording with 0.5 seconds overlap is on average about 12-14 seconds. However, sometimes it needs MUCH longer, I observed up to 58 seconds for the same recording length . . . in this case (where I captured a screenshot), it needed 39 seconds grafik

Maybe an effect of the RPi still busy with updating ?

DD4WH commented 2 years ago

Another anecdotal observation:

mcguirepr89 commented 2 years ago

I haven't had great success yet with getting the RPi0W2 to truly keep up. I haven't tried any recording length over 60 seconds, but even at 60 seconds, things really get backed up once there are detections. It runs really smoothly, until it needs to do its job!

The other consideration is that there are extractions going on for detections which are not doing anything until the analysis service gives it detections. Thus you end up getting backed up when it starts working :/

I'm going to experiment with cutting down on other resources, adding cooling, and maybe overclocking a tiny bit to see if I can get something that works well when the birds get talkative.

DD4WH commented 2 years ago

I think the reason why the Zero does not manage to keep up is that some analysis events last a huge lot longer than others. Spotted another of those events with 76 seconds instead of the usual 13 seconds !

grafik

DD4WH commented 2 years ago

After two hours, it seems the RPi Zero has kept up and the 10 minute lag is now not existent any more . . . I set overlap to 0.0 sec for that purpose, so 10 chunks instead of 12 chunks have to be processed in every 30second recording. Will now go for a longterm test and observe whether the initial bug occurs again. But I am pretty confident you fixed the issue ;-) ! Thanks a lot, Patrick!

mcguirepr89 commented 2 years ago

FINGERS ARE CROSSED! :crossed_fingers:

DD4WH commented 2 years ago

just one observation: UPDATE in Zero 2W over SSH (putty on Windows and then issue update_birdnet.sh)

Oops, the RPi refused to work any more, it got too hot to touch (the whole PCB), so I pulled the power supply plug . . . Seems the update was too demanding . . . Will try to restart as soon as it cools down

mcguirepr89 commented 2 years ago

Yikes! Do this next time, before even accessing the web browser:

SSH using PuTTY, not the webterminal (just saw your edit :+1:)into the BirdNET-Pi and issue stop_core_services.sh THEN do update_birdnet.sh and see if that helps.

DD4WH commented 2 years ago

look at this --> 7.2 seconds for an analysis of 30 seconds of audio !!!

grafik

grafik

whats happening there? :-)

Is this really a function of the CPU temperature ? But why doesnt it tell us about throttling ?

DD4WH commented 2 years ago

Is this really a function of the CPU temperature ? No, it does not seem to be the case . . .

grafik

mcguirepr89 commented 2 years ago

It is very curious -- with adequate cooling, we would know if that were the issue, without it, it will be a looming factor. It is possible that vcgen_cmd doesn't report accurately for this new RPi0W2 chipset? It was the case that above 60degrees C throttling led to this exact issue for the RPi3B and placed a ARM frequency limit as a result.

I want to rule it out before I rule it out

DD4WH commented 2 years ago

hmm, its slow again every time I return from one of the other pages of BirdNET-Pi (and when I stay in the log window a little longer, the time goes down again):

grafik

mcguirepr89 commented 2 years ago

Check the temperature now and see if it has neared 60C

DD4WH commented 2 years ago

no, it is at 54 degrees right after return from the log where it said 75 seconds for one recording . . . but the 75 seconds were the first figure reported after returning from the system info window --> so I suspect that the page handling / sysinfo is quite demanding!?

grafik

DD4WH commented 2 years ago

the strange thing is: I never had durations as low as 8 seconds. Before the update and overheating I reported, the minimum was about 11-12 seconds. But maybe that is caused by your bugfix ?

DD4WH commented 2 years ago

Next week I will have thermal glue and some heatsinks at my disposal and I will do exactly the same test WITH heatsink, we will see whether the behaviour changes.

mcguirepr89 commented 2 years ago

The bugfix just restarted the birdnet_recording.service after stopping it -- I only added:

sudo systemctl start birdnet_recording.service

after the lines where I stop it and remove truncated recordings.

Assuming this isn't a thermal issue, then this has something to do with other resources taking up CPU, OR something about sending the info in the socket. I've noticed it takes an exceptionally long time to send the message to the socket (you can journalctl --no-hostname -fu birdnet_analysis.service to watch how long it takes to send each message). This is something I will try to keep a closer eye on as it may reveal when/where exactly the latency is being introduced.

mcguirepr89 commented 2 years ago

I've noticed that the livestream service takes up a good bit of CPU. You can try disabling it for now:

sudo systemctl disable --now livestream.service
DD4WH commented 2 years ago

My system seems a little awkward now. I cant write any more to any file ("no permission") will reboot and then stop for today . . . heat issue in my head ;-)

DD4WH commented 2 years ago

Reboot did not fix it. Stopped services and did an update --> only about two minutes! After that I had to Restart all services, because nothing was running. And then normal function can be observed.

DD4WH commented 2 years ago

Unfortunately, in the night, the RPi Zero 2W stopped the analysis service again. It is reachable via the usual birdnetpi.local and everything seems normal, except that the log shows the same error message again, that I observed the night before. IDs are not being made.

grafik

Will not have much time the next days, so I will stop testing the Zero at this point.

If you have any ideas/directions for further tests with the Zero, I will resume testing next week.

mcguirepr89 commented 2 years ago

Two things worth noting:

  1. I was able to run an installation from scratch directly on the RPi0W2 without changing any settings (e.g., swapping, clock speed, service settings)
  2. Disabling the livestream.service drastically reduces the CPU load outside of analysis.

Recent tweaks that have helped a lot:

  1. I have configured the birdnet_recording.service to require the server.py socket be open and ready before collecting data (to mitigate data getting backed up).
  2. Likewise, I've amalgamated the birdnet_server.service and birdnet_analysis.service logs so that the logs are more verbose and show their interaction (as well as the python call that birdnet_analysis.service uses to send the data to the server socket).

I hope to get my hands on another microphone soon, so hopefully I will be able to keep testing this alongside my main installation to keep this developing.

My best, Patrick

DD4WH commented 2 years ago

@mcguirepr89 Thanks Patrick for your efforts!

UPDATE:

DD4WH commented 2 years ago

After another 30 minutes of playing with the system, it gets really laggy and inresponsive and then does not respond any more at all.

mcguirepr89 commented 2 years ago

@DD4WH I really appreciate your efforts. I'm going to test this intermittently as shifts towards efficiency and reduced resource usage come to mind. It seems SO close to being able to work, but invariably it gets bogged down from normal use. It may be that a solution could be to bypass the database and PHP altogether and have a system more like BirdNET-system wherein there is simply recordings and extractions with a file server. That's a much simpler "stack" and can still have BirdDB.txt for exporting the data and importing it into another system's database (maybe to have RPi0W2 collecting and sorting data, then a RPi4B houses the database and all its PHP visualizations etc.:thinking:)

mcguirepr89 commented 2 years ago

Hello!

Update: I am working on a new branch that will switch BirdNET-Pi over to using a single-file SQLite database to do away with the MariaDB server altogether. I have started testing with this and have already noticed that the RPi4B handles the database entries with MUCH more ease using the SQLite database, so I'm very hopeful that this is a step in the right direction in making things work on the RPi0W2.

DD4WH commented 2 years ago

Sounds very good! Let me know as soon as you want me to test the new branch!

mcguirepr89 commented 2 years ago

I will! But it won't be for a little while -- I have a lot of testing I'd like to do before I ask any others to join in. In typical crappy-development-praxis-Patrick fashion, I will be working on the SQLite at the same time as a new, simplified installation (the user won't really notice a difference).

:crossed_fingers:

DD4WH commented 2 years ago

Testing the Zero 2W with the sqlite version of today:

After an hour of playing the Pi gets really slow and unresponsive . . .

It seems it is under 100% load, although the temperature of the CPU does not indicate that:

throttled is still at 0x0

grafik

DD4WH commented 2 years ago

OK, now the Pi is so slow and unresponsive, that it does not make sense to continue further testing.

I think I will now try to glue on some heatsinks onto the CPU and then retry with a cold start.

DD4WH commented 2 years ago

many problems after restart:

The Pi is still slow to respond, needs some seconds to load a page, although the temperature is much lower now with the heatsink, has not exceeded 45 degrees