timmo001 / system-bridge

A bridge for your systems
https://system-bridge.timmo.dev
Apache License 2.0
264 stars 14 forks source link

Linux - Application runs for a few hours then crashes with: "OSError: [Errno 24] Too many open files" #1876

Closed Anto79-ops closed 1 year ago

Anto79-ops commented 2 years ago

Description

Hi!

Running the program in UBuntu 22.04 and using the web url to access to data

└─5042 /usr/bin/python3.10 -m systembridgebackend --no-gui

everything runs fine for a few hours, data is being generated and upates but then stops with the error messag below:


Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/systembridgebackend/modules/>
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:     self._bridge = Bridge(self.service_changed)
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/systembridgebackend/modules/>
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/zeroconf/_core.py", line 450>
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/zeroconf/_utils/net.py", lin>
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/zeroconf/_utils/net.py", lin>
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/zeroconf/_utils/net.py", lin>
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]:   File "/home/anto/.local/lib/python3.10/site-packages/ifaddr/_posix.py", line 56, >
Jul 22 07:42:15 GC676AA-ABA-m8150n python3.10[5042]: OSError: [Errno 24] Too many open files

What Platform / OS are you running?

Linux

What version are you running?

pyinstaller-5.2 sanic-ext-22.6.2 systembridgebackend-3.4.1 systembridgecli-3.4.1 systembridgefrontend-3.4.1 systembridgegui-3.4.1 systembridgeshared-3.4.1 systembridgewindowssensors-3.4.1 typer-0.6.1

Anything in the logs or a references that might be useful?

Successfully installed pyinstaller-5.2 sanic-ext-22.6.2 systembridgebackend-3.4.1 systembridgecli-3.4.1 systembridgefrontend-3.4.1 systembridgegui-3.4.1 systembridgeshared-3.4.1 systembridgewindowssensors-3.4.1 typer-0.6.1

Additional information

No response

ScottG489 commented 2 years ago

I had the same issue and reported it as a standalone comment - https://github.com/timmo001/system-bridge/issues/1831#issuecomment-1176809830.

It's likely still an issue for me, but since I haven't gotten the service working yet for other reasons, I haven't ran it for an extended period of time recently which seems necessary to reproduce the issue.

Anto79-ops commented 2 years ago

I can still confirm and reproduce that this is still happening. Happens within the first 5 hrs of starting the service. Please let me know if you need more data from my end.

AND as for the HA side of things, looks like there is some requested changes to the PR for merging into HA:

https://github.com/home-assistant/core/pull/75362#pullrequestreview-1059870510

cheers @timmo001

timmo001 commented 2 years ago

AND as for the HA side of things, looks like there is some requested changes to the PR for merging into HA:

home-assistant/core#75362 (review)

Unrelated

timmo001 commented 2 years ago

This appears to only occur on linux, which I don't have a dedicated system to test with

Anto79-ops commented 2 years ago

thanks, @timmo001

I've reached out the Ubuntu forums, perhaps they can shed some light on this

https://ubuntuforums.org/showthread.php?t=2477750&p=14106978#post14106978

Anto79-ops commented 2 years ago

also, I found this:

https://stackoverflow.com/questions/68784048/oserror-errno-24-too-many-open-files-in-python-difficult-to-debug

Do you think its just a matter of increasing the limit? Any ideas as to what it should be?

Anto79-ops commented 2 years ago

FYI, when I check my system

anto@GC676AA-ABA-m8150n:~$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 15425
max locked memory           (kbytes, -l) 502296
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 15425
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

it seems I have a limit of 1024 files.

OR does system_bridge close the files after it opens them (if files are being opened)?

timmo001 commented 2 years ago

It should, but there may be something else causing this. Potentially a fix could be to increase this limit

Anto79-ops commented 2 years ago

It should, but there may be something else causing this. Potentially a fix could be to increase this limit

Okay I'll test it by doubling my maximum file open limit. If that basically just doubles the time and then crashes then you're right there could be something there but if it works, then it works.

One thing I did notice is that my Ubuntu it has the old spinning hard drive and whenever system bridge is running the hard drive does work often suggesting quite a bit of disk activity.

Keep you posted

Anto79-ops commented 2 years ago

just a quick update, I have not made too much progress in this. Turns out you can increase file open limits, but there are at least 3 different ways of doing this, from either a process point of view, user point of view and system point of view.

Being not so fluent in Linux, I just have to rely on others to direct me...so I'll post something when I get new information.

@ScottG489 I don't know if you are familiar with this, perhaps you can try on your Linux system?

EDIT: by the way, @ScottG489 what are your file open limits? 1024 like mine? typing ulimit -a will tell you.

ScottG489 commented 2 years ago

I don't have access to the exact machine at the moment, but another linux machine has a limit of 1024 for file descriptors (ulimit -n).

However, increasing the systems limit on file descriptors isn't the appropriate solution. This is an application side issue and needs to be fixed here.

Anto79-ops commented 2 years ago

@ScottG489 you're possibly right, there could be a bug in the code, and @timmo001 mentioned that could very well be the case (and some on the Ubuntu forums are saying its the code)

The reason why I'm willing to try the file open increase, is because the system that I'm running system bridge on, also is my MQTT broker and InfluxDB database, so its doing some other stuff.

If your Linux system is barely being used and it still reaching the 1024 limit, than I'm more inclined to believe that their some code leak for system bridge, and not an issues with the OS. I guess we will figure this out eventually :)

Anto79-ops commented 2 years ago

This appears to only occur on linux, which I don't have a dedicated system to test with

Hi @timmo001

Can I make a suggestion? You can install Ubuntu 22.04 on Windows computer as dual boot. I've done this, its pretty simple Ubuntu will recognize the other OS and install alongside it, and then upon restart you'll get a grub screen to pick which OS you want boot up. This way, you might be able to get a better understanding of these 2 issues, here.

https://github.com/timmo001/system-bridge/issues/1928 https://github.com/timmo001/system-bridge/issues/1876

Unless its just my system, @ScottG489 have you gotten System bridge to work with the new updates?

This is a really awesome integration, its just to bad I can't get it to work with my linux box!

ScottG489 commented 2 years ago

Hey @Anto79-ops, thanks for keeping on top of this.

I believe the last version I tried things on was 3.4.3. I don't see any changes in the most recent version (3.4.4) that would indicate a fix, but I'll try again soon.

I'm also curious if anyone on Linux has gotten this to work? Have you ever had it working on Linux, @Anto79-ops? @timmo001 not having availability to a Linux box, even in a VM, makes me wonder how stable this is on Linux. However, I haven't taken a look at the test setup for this project, so it could have adequate coverage there.

Anto79-ops commented 2 years ago

@ScottG489 you're the only other Linux user that I know that is using system Bridge.

I have the same version as you however recently just yesterday home assistant merged some of @timmo001 PRs, which was supposed to solve a second issue of not seeing entities in HA. Updating to the latest ha core and I'm still not seeing entities in HA.

If you're not using HA then it's not that important for you, but the first issue will be related to you.

timmo001 commented 2 years ago

I've been running System Bridge on a debian headless server, no issues. Added to HA and showing entities, which is why these issues are confusing to me

Anto79-ops commented 2 years ago

I think the next step would be to reach out to other forums.

I'll post my issues on the HA forums, perhaps some can confirm or spot something.

Its a great community, things usually get solved.

@timmo001 just curious, do the errors posted on my 2 issue make sense to you?

Anto79-ops commented 2 years ago

Also, you mentioned that its working on your headless Debian. Perhaps I can also try the headless Ubuntu...if that makes sense?

I mean it has a desktop when I connect a monitor to it, but, there is no monitor connected to it and I simply just ssh to it when I need to.

Could this be related?

ScottG489 commented 2 years ago

I think at a high level it would be a good idea to say what distros and versions this has been formally tested against. To troubleshoot the existing issues, we should try looking at the versions of dependencies that differ between systems where this project works and not. Then try looking at any other environmental differences.

I'm not sure if there's a docker offering for this project, but that would eliminate a lot if not all of these kinds of problems.

Also I do use HA, but since I hadn't gotten the service running stably, I didn't bother trying to integrate it yet.

renarena commented 2 years ago

Same error, Linux Mint Debian Edition 5 + Cinnamon after the error occured, systemmonitor shows Memory usage >500MB py1 After restart it was ~40MB List of open files for python tasks systembridgebackend started with 40 files, now after 20min there are 186 files, most of them with type "unknown"
Memory used 185MB

github-actions[bot] commented 1 year ago

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues. Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!

ScottG489 commented 1 year ago

Still an issue

github-actions[bot] commented 1 year ago

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues. Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!

ScottG489 commented 1 year ago

Still an issue

Anto79-ops commented 1 year ago

Yes it's still an issue. I've removed the integration from both home assistant and my Ubuntu system.

This issue still persists along with increased disk activity and memory usage.

I'd be happy to try again later

timmo001 commented 1 year ago

This seems to be Linux specific and since I do not daily drive a Linux system, I would need help in diagnosing this issue.

It seems it could be a wider Python issue/limitation from what I've read so far

Brix1330 commented 1 year ago

I have an idea for a workaround, maybe make a script that when [Errno 24] Too many open files detected, it automatically restart the app? Restarting app solved the issue.

jd1 commented 1 year ago

Just some remarks:

https://www.howtogeek.com/805629/too-many-open-files-linux/

I did a quick check on my system (Ubuntu 22.04, systembridgebackend --no-gui) and systembridge has less than 300 (~280-290) open "files". Many of them are network connections. I also have never seen this error.

Anto79-ops commented 1 year ago

Thanks for this information @jd1

I just updated system bridge to the latest and ran it as a no GUI.

Can you please tell me how I can check how many files system bridge opens? That way I can confirm whether it is system Bridge or not that is causing this issue. Using Ubuntu 22.04 Cheers

jd1 commented 1 year ago

At first you need to find the process id of systembridge, e.g. have a look at ps aux | grep python.

Then you can list all open files of this process (I will use 1337 as an example) by running: lsof -p 1337. If you want only the number of open files, run lsof -p | wc -l.

Remark: I'm writing this on my tablet, so there might be some typos in the command, but I hope I got them correct :) You can find most of them in the articles linked above or in the links of the articles.

Anto79-ops commented 1 year ago

Awesome! ITs been running for 1 day and 8 hrs. now (which is the longest its ever run on my system), and no more too many open file errors....open files ranges from 100 to 133.

This is great news!

For those of you we have experienced this in the past from Jul 2022....try updating to the new version.

Anto79-ops commented 1 year ago

2 days and still going, 100 open files. Nice!

I'll give this a couple more days and I'll close this issue. So far so good!

Now, the HA PR needs to be pushed. Brought it Frenck's attention and he labelled as smash...so hopefully soon

Anto79-ops commented 1 year ago

approaching 3 days now...still running! still only 100 open files.

I'm going to be closing this issue in 24 hrs.

thanks @timmo001 !

Anto79-ops commented 1 year ago

4 days and still running, strong. 100 open files

Ok, this problem is officially solved.

How it was solved...I have no idea BUT I just kepy updating Ubuntu (including distro updates) and updated to latest System Bridge and it works.

thanks all @jd1 for bring this up,, and @timmo001 for the development. Cheers