[ ] The code changes are reflected in the documentation at docs/*.
\<Description of and rationale behind this PR>
When sending traffic from a non-local multicast sender, via an multicast router, it was only working sporadically. For some reason the router was dropping the subscriptions every now and then, and thus stopped forwarding the multicast traffic.
I suspect this could affect L2-devices doing IGMP-snooping too.
The IGMP Queries had a max-response-timeout of 100, but in packet logs this was clearly not respected by the device. Sometimes minutes passed between reports. After a lot of digging and printfs (oh btw, os_printf does not work. had to hack around to redefine it to dbg_printf), I found root cause: LWIP_RAND() aka r_rand() is defined as int (well, in app/include/lwip/igmp.h it's actually not defined at all, in sdk/esp_iot_sdk_v3.0-e4434aa/third_party/include/lwip/igmp.h it IS defined. Not sure which one is used?).
Anyway, signed means negative. and % 100 of negative value, casted to u16_t is potentially > 100. Experiment here: https://godbolt.org/z/vzWGEqzzx
This fix ensures that we actually cap timer to max_time and thus reply within the routers expected interval.
Fixes #\<GitHub-issue-number>.
dev
branch rather than for therelease
branch.docs/*
.\<Description of and rationale behind this PR> When sending traffic from a non-local multicast sender, via an multicast router, it was only working sporadically. For some reason the router was dropping the subscriptions every now and then, and thus stopped forwarding the multicast traffic. I suspect this could affect L2-devices doing IGMP-snooping too.
The IGMP Queries had a max-response-timeout of 100, but in packet logs this was clearly not respected by the device. Sometimes minutes passed between reports. After a lot of digging and printfs (oh btw,
os_printf
does not work. had to hack around to redefine it todbg_printf
), I found root cause:LWIP_RAND()
akar_rand()
is defined asint
(well, in app/include/lwip/igmp.h it's actually not defined at all, in sdk/esp_iot_sdk_v3.0-e4434aa/third_party/include/lwip/igmp.h it IS defined. Not sure which one is used?).Anyway, signed means negative. and
% 100
of negative value, casted tou16_t
is potentially > 100. Experiment here: https://godbolt.org/z/vzWGEqzzxThis fix ensures that we actually cap timer to
max_time
and thus reply within the routers expected interval.Another fix is to ensure we define LWIP_RAND() as (u32), like defined here: https://www.nongnu.org/lwip/2_0_x/group__compiler__abstraction.html#ga77370c377781ee7489e30eaf772ea05a but.. again, not sure about all lwip files, what goes where and so on..
Upstream has a lot of other fixes, but not sure lwip and nodemcu lifecycles integrate.. https://git.savannah.nongnu.org/cgit/lwip.git/tree/src/core/ipv4/igmp.c#n692