RX not receiving anymore on fast, large transmissions

isengard412 commented 7 years ago

The ethernet library stops receiving packets if there is too much time spent in rx.packet_ready() case. This can easily be reproduced by using the 100MBit example and modifying the rx.packet_ready() case. Just add a timer and wait some time (XS1_TIMER_HZ or XS1_TIMER_KHZ is definately long enough) (attatched). The longer you wait, the earlier the error will occur. If you just wait short (for example 100u) everything works fine.

Now send big packages (1500bit) with full speed towards the xmos. If you just wait for 100u, then data is being received with about 98.2Mbit/s. If you spend much more time in the rx.packet_ready() case, then it crashes or becomes unresponsive

If it crashes, then it throws a ET_LOAD_STORE exception. The buffer pointers seem to be wrong in this case. The rx buffer is circular. Normally if the 32 slots are full, then the read_bank_wr_ptr is right behind the read_bank_rd_ptr and new packets are being dropped. If packets are being read, the read_bank_rd_ptr moves on and the read_bank_wr_ptr writes the next packages. The problem is that somehow when the error occours the read_bank_rd_ptr is equal to read_bank_wr_ptr and the buffer is full (next_buffer=-1). In the next read event, mii_lite_get_in_buffer just returns zero as data address. After forwarding the wrong data towards the upper layers, the paket is being released. This is done by mii_lite_free_in_buffer. As the data address is zero it now crashes as this function substracts 8 from this address and tries then to write at this position (0xFFFFFFF8) which is obviously not possible.

A short python3 script is attatched. It just sends random UDP packages towards device. The mii_lite_data_t data is also attatched.

AN00120_100Mbit_ethernet_demo.zip test_eth.zip

pthedinger commented 7 years ago

Could you try the code in the pull request (https://github.com/xmos/lib_ethernet/pull/11) as I have fixed this issue in one case definitely. It might not be the same as this case but it would be good to know whether it is still an issue after applying the pull request.

isengard412 commented 7 years ago

I already tried your pull request, but did sadly not solve my problem. I have spend much time in trying to solve the problem on my own, but did not succeed so far.

pthedinger commented 7 years ago

Ok, thanks we will investigate further.

Redeye92 commented 6 years ago

Has any progress ever been made on this? I seem to be having the same problem - certainly the ethernet mii crashes in exactly the same way. This is turning into a pretty big problem for me as I now have product randomly crashing which has to be power cycled to bring it back online.

The code supplied above reliably reproduces the crash for me. In my products the problem is normally triggered by a PC connecting to multiple XMOS devices at the same time which causes a mini "ARP storm" as they all try to get the IP address of the PC and send a flood of ARP broadcast messages. So it looks to me like the problem is not so much the volume of data but more the number of packets being received while the code is stuck in the rx.packet_ready() case.