openxc / vi-firmware

OpenXC-compatible firmware for PIC32 and LPC1768
http://vi-firmware.openxcplatform.com
BSD 3-Clause "New" or "Revised" License
198 stars 106 forks source link

Unpredictable data returned with passthrough handler on LPC1769 #47

Closed peplin closed 11 years ago

peplin commented 11 years ago

With a CAN bus bench test rig sending vehicle data and the LPC1769 based translator flashed with the passthrough firmware, I get good raw passthrough data for 5-10 seconds, then I get about 5 translated OpenXC messages and then it freezes. On the debug log of the translator, it complains that the USB queue is full and ~5 messages are dropped, then it stops.

The translator is not hard faulted, as I can query for the version over USB's EP0 and it works fine.

The translated messages are really, really puzzling - there is no occurrence of a string like steering_wheel_angle anywhere in the code base when building for passthrough and yet it comes up with the value. I was wondering if perhaps it was still in un-erased flash memory on the chip, so I dumped the firmware back to my hard drive. Running strings on it didn't return any results for steering_wheel_angle but I'm not positive this was a complete test.

I added additional debug logging to the Python library to print out the complete message buffer received via USB and not just the successfully parsed messages. When the translated messages do appear and it freezes, we get a lot of what looks like random memory content over USB. It's a bunch of junk, with some translated messages mixed in.

My theory is that in the LPC1769 specific code, we're somehow walking off the end of an array and starting to return invalid memory addresses. Double check this, but I'm fairly certain we don't see this problem when running the CAN emulator, which indicates the problem is most likely in the passthrough-specific or CAN specific code.

peplin commented 11 years ago

Today the behavior is slightly different, it outputs passthrough data for a while and then hard locks. I disabled the receiveCanMessage function (so nothing will be read from the CAN buffer, and nothing will be queued) and after sending some traffic from the tester, it still locks. This might help narrow it down to the CAN interrupt.

peplin commented 11 years ago

CAN_IRQHandler disabled, running version check every 1s, no CAN messages being sent from tester: works fine forever.

CAN_IRQHandler disabled, running version check every 1s, CAN messages being sent from tester: locks up almost immediately.

CAN_IRQHandler disabled, running version check only at start and end of sending test data, CAN messages being sent from tester: version check at the end shows that it's hard locked.

CAN_IRQHandler enabled, running version check only at start and end of sending test data, CAN messages being sent from tester: version check still works after sending the test data set multiple times.

CAN_IRQHandler enabled, running openxc-dump, CAN messages being sent from tester: some data comes through, then I see the start of USB send queue ful... in the debug log, USB stops and then version check failed: it's hard locked.

Summary:

peplin commented 11 years ago

CAN_IRQHandler enabled, sendToHost commented out for USB sending: doesn't die, see USB send queue full log messages.

The same, but actually pulling data off of the USB queue into a buffer, but not writing it out to the endpoint: queue doesn't overflow, still alive after sending test data.

The same, but clearing the endpoint after the buffer: good.

The same, but calling Endpoing_Write_Stream_LE with bytes: dies.

Calling Endpoing_Write_Stream_LE with 64 bytes or less: seems ok...but running the emulator with this limitation causes it to crash. The messages generated by the emulator can be longer than 64 bytes, however, that shouldn't cause it to die. After flashing the emulator then back to passthrough, I now see the mystery translated messages again. This still lends credence to the theory that we're accessing invalid memory.

peplin commented 11 years ago

If I don't try to pull data over USB, it works fine over Bluetooth.

peplin commented 11 years ago

I updated nxpusblib to 0.98 and it seems to be working OK!