Open joseph-henry opened 6 years ago
I know this is an old issue, but I just ran into this running libzt overnight, and I'd appreciate some guidance, if you can spare it. To your knowledge, does this lwIP bug/state prevent new connections? (If so, maybe I could set up some logic to restart the ZT node when ENFILE appears in zts_errno
?) Or can it be ignored?
No problem. It's been a while since I've poked around in that area of the code but I do believe that would limit the creation of new connections so a restart would be necessary. Do you know what the number of the last successfully created fd
was? I think a general limit of 1024
exists and can be adjusted by configuring MEMP_NUM_*
constants in src/lwipopts.h
. If this does seem to be your issue I can consider bumping that number up a bit.
I just ran a test program this afternoon: zts_node_start
followed by zts_net_join
(an ad-hoc network) and then I let it sit, with a little loop that, once a minute, makes a zts_udp_server
and closes it immediately. I log the returned fd
, which is (as expected) always 0
.
I saw my first one of these
socket(unix): Too many open files
socket: Too many open files
after about two hours. Then -- I don't know if this is interesting or not -- the program loops placidly again, without errors, until I get another Too many open files
, exactly five minutes later. Then it repeats. An odd little cycle; something's on a timer inside libzt or lwIP?
Even in the loops that produce a Too many open files
message, I get back an fd
of 0
rather than an error code; of course, I don't know what would happen if I tried to use that socket.
(I also get bursts of recv: Connection reset by peer
messages on a five-minute cadence, but I have always seen those & figured they were benign libzt diagnostic messages. Just mentioning for the sake of completeness.)
It seems that sometimes lwIP will report that there are too many files open on the system
ENFILE
, but this seems to be only from lwIP's connection allocator and not a system-wide issue.I suspect a maximum number of descriptors is being issued to a lwIP
netconn
and they aren't being freed properly byfree_socket()
.