gopher-motorsports / data-logging-module

Data Logging Module for the Go4-22
2 stars 0 forks source link

Rare packet corruption #54

Open caljay98 opened 2 years ago

caljay98 commented 2 years ago

Pretty sure this has something to do with the first and last block being written to, and maybe something to do with the generic single byte data being written and not being aligned to 32bits

olale013 commented 2 years ago

This test was on a dev-board with jumper wires and sd breakout

Debug 1:

Debug 2:

Debug 3:

In all three cases I checked the buffer just before f_write to note the last packet. _The bad packets do not occur at the end of fwrite.

There's a weird pattern with the first two bad packets having the same packet count and the same issue. Other than that the issues seem to be random.

Average invalid packets for these three runs: ~0.03%

If it wasn't for the consistency in the first two bad packets I would call it a signal transmission error rather than code.

olale013 commented 2 years ago

Testing with the dlm-sim...

Every 45-46 packets (of 11 bytes) there's a chance that a bad packet will appear. Usually, there's 45 consecutive good packets, sometimes 138, etc. but it's always a rough multiple of 45/46.

45*11 = 495 + some good bytes. Bad writes seem to happen at the edge of a sector (512 bytes) but not always. Forcing f_write to write exactly 512 bytes does not fix the issue.

UPDATES:

olale013 commented 2 years ago

Packets must be a multiple of 8 bytes long.

I still don't understand the underlying issue. f_write is perfectly capable of writing an odd-number of bytes.

The dlm-sim is now functional with a uniform packet size and a new append_packet function.

olale013 commented 2 years ago

Spoke too soon. This fix appears to work for the dlm-sim, but not the dlm.

Using packet sizes of 16/24/32 bytes does appear to reduce the number of errors though.

caljay98 commented 2 years ago

We should try sending some memory that is statically defined to see if the bug still happens and write that

olale013 commented 2 years ago

Could be related:

CMSIS V2 appears to trigger a bug somewhere that causes memory corruption in the Ethernet code, particularly around the receive semaphore.