rtnl.listener dies on message burst

I noticed that the rtnl listener callback I setup in a ucode script would appear to randomly "die", without any error message and while leaving the rest of the script operating normally.

After a bit of digging I think I have tracked it down to the point where it seems to be a resource exhaustion of some sort: the bug can be reproduced using the attached ucode script, which sets up a simple listener on RTNLGRP_NEIGH that prints the received messages.

Everything goes well until the neigh garbage collector kicks in and deletes a large number of neigh entries, resulting in a "large" (hundreds) number of messages being delivered. The script will typically appear to hang after printing anywhere between 0 and the first few of the delete messages ("cmd": 29), with no error what so ever.

On a system where the neigh GC is set like so:

net.ipv4.neigh.default.gc_thresh1=512
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096

(values fairly typical for a busy router), the garbage collector may delete hundreds of entries in one go when it kicks in (when more than 512 entries have been created), triggering the hang. I have not been able to reliably reproduce this bug when thresh1 is set to e.g. 128, which typically results GC kicking more frequently and in only a few dozen entries being pruned at once on a typical GC run, so the problem only seems to occur when a certain threshold number of messages occur "at once".

I provide a memdump of the script taken after the hang.

rtnlbug.uc.txt ucode.1703872887.23407.memdump.txt

jow- / ucode

rtnl.listener dies on message burst #184