Open dcyoung opened 7 years ago
Looks like the current implementation (sleep) is causing an issue. Therefore, adding "bug" label for now.
The example-c++
and example-viewer
are throwing an occasional (read ~10% of the time) error on program exit.
unable to receive stop scanning command response
From some early debugging, it looks like this is because the sleep duration in the sweep_device_stop_scanning
method is occasionally too short. Increasing the duration gets rid of the errors, but is NOT the proper fix. More reason to implement a proper parse in sweep_device_stop_scanning
.
Note: This parsing requirement may have implications for how we design the buffer/queue for the worker thread. See #31
Care to specify how to check for values in byte 0? From what I can tell looking at
it's not yet clear to me what "specific specific error codes" look like. Especially the "reserved for future use" part is a bit hand-wavy for me.
I'm with you that what we're doing is a bit ugly waiting for some time and then trying again. But if we can simply wait 0.005s (or 2x or 10x as long - can we calculate this?) and it will just work, I'm also fine with increasing the time if it makes it easier for us.
Sure thing.
We use the 1st transmitted BYTE in a data block to encode both the sync value and a potential error code. Since the sync value is binary, we only use the least significant BIT in the byte to encode the sync. We call this the sync bit
. The remaining 7 BITS (e0:e6
) can then be used to transmit an error code.
Therefore, we can parse the sync value by looking at just the LSB, and parse the error code by considering the other 7 bits as a 7 bit integer. 7 bits leaves us with 2^7 = 128 possible error codes we could send.
Currently, the device firmware only supports one kind of error (a communication error with the LiDAR module). Therefore the only error message that we currently need to parse out of the 7 higher order bits (e0:e6) is the case where e0 has a binary value of 1, indicating a communication error.
example:
// Consider the DATA_BLOCK contents...
// check that the error bits (e0-6) are all 0 (ie: DATA_BLOCK contains NO ERRORS)
if ( !(responses[received].sync_error >> 1) ) {
// Check if DATA_BLOCK is a sync reading (ie: first reading of a new 360deg scan)
// & if we've already been gathering DATA_BLOCKs
if ( responses[received].sync_error == 1 && received > 0) {
// Note the end of the scan
last = received;
//FIXME: add check that "last" does not exceed SWEEP_MAX_SAMPLES
// BREAK to return the scan (NOT including this DATA_BLOCK)
break;
}
}
else {
// some error
//FIXME: Handle each type of error code and shutdown gracefully if need be.
switch(responses[received].sync_error >> 1 ) {
case ...
}
}
What this issue proposes, is that we never send an error code that could produce the ASCII value 'D'. We would lose 1 of our 2^7=127 possible error codes, but I think that 127-1=126 is more than enough error codes for our needs... seeing as we are only using 1 currently.
This guarantee that we'll never receive a Data Block that starts with an ASCII "D", combined with some other guarantees I outlined in the original post, is enough criteria to reliably parse the incoming data blocks for a 'DX' receipt.
Update: we implemented the error code check in https://github.com/scanse/sweep-sdk/pull/45.
We're now sleeping for 20ms and at least with my devices it seems like this is good enough.
What is your take here, should we go the route of parsing the packets and handling it ourselves? I'm more than happy with the simply workaround (as long as it works reliably).
Sending a
DX
command (host->sensor) to a streaming sensor stops data acquisition. However, this doesn't happen instantaneously. Once theDX
command is sent out, the host will still receive a fewData Block
receipts before seeing theDX
receipt.Currently the
sweep_device_stop_scanning
uses the following technique:DX
command (host -> sensor)Data Block
receipts and theDX
receipt to arrive (sensor -> host).DX
command (host -> sensor)DX
receipt (sensor -> host) that returns as a response to step 4This was really a temporary workaround and should be replaced by an actual check for the
DX
receipt among the incomingData Block
receipts. The initial reaction might be to assume we cannot reliably check for aDX
receipt among incomingData Block
receipts because the bytes of aData Block
are variable. The fear is that they could theoretically take the appearance of aDX
receipt. However, we can still reliably discern the two because:Data Block
bytes 1-6 can take any form, byte 0 (sync/error byte) is limited to very specific error codes and is therefore controllable. We can guarantee that byte 0 will never look like the first byte of aDX
receipt.Data Block
receipts before sending aDX
receipt. That is to say that aDX
receipt will never come mid-Data Block
.Data Block
receipts are transmitted after theDX
receipt is transmitted.Obviously we can't protect against data corruption with 100% reliability, but we can certainly protect against valid
Data Block
receipts being misinterpreted as aDX
receipt.In
sweep_device_stop_scanning
, we should do something along the lines of:DX
command (host -> sensor)Data Block
receipt or aDX
receipt.Data Block
receipt, handle it... (as we would during scanning, or simply trash it). In the case of aDX
receipt, we proceed as if we picked up from step 5 above.