Open nashif opened 7 years ago
by Paul Sokolovsky:
As a bit of more info, I saw this before https://gerrit.zephyrproject.org/r/10902 (dynamic PHY config on cable connect/disconnect), and saw with it too.
by Paul Sokolovsky:
A case walkthru:
by Paul Sokolovsky:
Ok, a response always arrives actually, but perhaps after 4-5s. Typing a next one quickly makes previous response arrive immediately.
All that, I never saw with QEMU.
by Paul Sokolovsky:
Ok, in such state, TCP echo doesn't work at all, so it's definitely another manifestation of GH-1577.
by Paul Sokolovsky:
Now I'm torturing it for the whole 3 minutes, and it still works.
by Paul Sokolovsky:
The same session now underwent 1000 pings (standard pings), and all 6 comm channels ((ICMP + UDP + TCP) * (IPv4 + IPv6)) work well, w/o any delays. (I pay special attention to pings, because I usually start testing with them, and in most cases, the issue manifests soon with them).
This proves that this is not inherent systematic issue, but a race condition. And we can't even be sure it's race in eth_mcux, because Z IP stack surely has its own share of races and weird things yet. And likely, it's eth_mcux race attenuated by Z stack race.
That was "good" news, bad news is that we can't reproduce it in at our will (though natural reoccurrence is quite high definitely).
Reported by Paul Sokolovsky:
I never saw this issue with QEMU, so theorize that it's frdm_k64t Ethernet driver issue.
Sometimes, instead of usual sub-millisecond pings:
you're starting to get:
So far, I wasn't able to find an exact way to reproduce, but it happens often. Say, I see it in every testing session (5-6 board resets/Ethernet cable reconnects).
When this happens, all other packets seem to be delayed too, not just pings. For example, using netcat UDP with echo_server shows the similar effect of 1s delay.
(Imported from Jira ZEP-1678)