klutchell / balena-pihole

Pi-hole is a Linux network-level advertisement and Internet tracker blocking application.
https://pi-hole.net
MIT License
301 stars 101 forks source link

WARN: Could not accept() in listener() (/__w/FTL/FTL/src/api/socket.c:267): Bad file descriptor #162

Closed klutchell closed 2 years ago

klutchell commented 2 years ago

My FTL logs are filling up with this at an incredible rate. GB per day. I hope it's only on my device but I need to get it sorted asap.

[2022-08-01 11:04:56.679 5613/T5614] IPv4 telnet error: Success (0)
[2022-08-01 11:04:56.679 5613/T5614] WARN: Could not accept() in listener() (/__w/FTL/FTL/src/api/socket.c:267): Bad file descriptor
[2022-08-01 11:04:56.679 5613/T5614] IPv4 telnet error: Success (0)
[2022-08-01 11:04:56.679 5613/T5614] WARN: Could not accept() in listener() (/__w/FTL/FTL/src/api/socket.c:267): Bad file descriptor
[2022-08-01 11:04:56.679 5613/T5614] IPv4 telnet error: Success (0)
[2022-08-01 11:04:56.679 5613/T5614] WARN: Could not accept() in listener() (/__w/FTL/FTL/src/api/socket.c:267): Bad file descriptor
[2022-08-01 11:04:56.680 5613/T5614] IPv4 telnet error: Success (0)
[2022-08-01 11:04:56.680 5613/T5614] WARN: Could not accept() in listener() (/__w/FTL/FTL/src/api/socket.c:267): Bad file descriptor
[2022-08-01 11:04:56.680 5613/T5614] IPv4 telnet error: Success (0)

Will be logging some of my investigation here.

klutchell commented 2 years ago

Wonder if resolving this issue will just fill up my tmpfs and then what? Will the container restart or will the device hang?

eiddor commented 2 years ago

Ugh - mine crashed, I assume for the same reason.

image

image

Is there a way to recover it remotely?

klutchell commented 2 years ago

@eiddor yup, you should have VPN access via the dashboard still? It looks like you do.

Open a shell session into the pihole container and rm -rf /var/log/pihole/*

klutchell commented 2 years ago

If the container isn't running, we can find that path on the host...

find /mnt/data/docker/overlay2 -name FTL.log -delete
klutchell commented 2 years ago

I wonder if it's related to this commit? https://github.com/pi-hole/PADD/pull/235

eiddor commented 2 years ago

The container appears to be running, but I can't connect to it - image

I'm on the host, but find on this shell doesn't like -delete for some reason. I found it manually :-)

Deleted the file and rebooted, but now it's not coming back - I might have to recover it this weekend. :-(

klutchell commented 2 years ago

@eiddor I can help recover this whenever you have time.

Currently it looks like the balena engine (aka docker daemon) is not running, so possibly a partial file from when it was out of space. When you have time you can check a couple things so we know what to try next.

Check the engine logs to see why it can't start:

journalctl -u balena

Check the space on the data partition

df -h
du -cksh /mnt/data/docker/*
klutchell commented 2 years ago

Also, if you don't have any customized settings, ad lists, devices, etc (or if you have a recent backup) we could just purge the data dir and reboot.

rm /mnt/data/remove_me_to_reset
reboot
eiddor commented 2 years ago

@klutchell Unfortunately the entire device did not come back after I deleted FTL.log and rebooted last night. It's showing offline in the dashboard and is not even pingable on my network. I suspect the full fs somehow corrupted the host, but that's purely a guess.

I'm going to have to flash a new sd card this weekend, I think.

klutchell commented 2 years ago

Generally a full data partition should not be able to impact the hostOS, that's why we keep the rootfs as read-only, but it's possible something else is going on here.

eiddor commented 2 years ago

Ok - Had to create a new device a flash a new card, so it's back online now (pinned to the -1 release with the old version of PADD.)

I'm remote from that device so I can't test on it, but I can add a second fleet/device and do some testing if you have some ideas.

(FWIW - It would be neat to be able to gen a new image for an existing device. Do you see a use case for that?)

eiddor commented 2 years ago

400 MB to 12.3 GB overnight

image

klutchell commented 2 years ago

400 MB to 12.3 GB overnight

@eiddor this is with the previous version of PADD? I haven't seen this issue since I reverted.

Can you capture some of the logs so we can see if it's the same message filling up the disk?

eiddor commented 2 years ago

@eiddor this is with the previous version of PADD? I haven't seen this issue since I reverted.

Oh, sorry - I should have been clearer! I setup a test fleet with a local device and the current release, then I upgraded it to the new version of PADD and am seeing the same problems. Now I have something we can test with that I don't actually use for DNS.

klutchell commented 2 years ago

Now I have something we can test with that I don't actually use for DNS.

I was going to set up a similar testing device but haven't had a chance yet. Though it seems you've confirmed the issue is only present when using the new version of PADD (maybe we mark that PR as draft for now)

eiddor commented 2 years ago

Oops - Went to mark it as draft, and request your review instead :-)

eiddor commented 2 years ago

Looks like this is being tracked here: https://github.com/pi-hole/PADD/issues/252

Specifically this post.

klutchell commented 2 years ago

Nice find! I guess we can just wait it out, since I have very little time this week anyway!

klutchell commented 2 years ago

Hopefully resolved by https://github.com/klutchell/balena-pihole/pull/167

klutchell commented 2 years ago

Resolved by https://github.com/klutchell/balena-pihole/pull/167