SerialDispatcherP can lose packets, loses double buffering, and doesn't implement FIFO correctly

tinyos-issues commented 11 years ago

Original author: cire...@gmail.com (October 18, 2010 00:21:59)

SerialDispatcherP implements a double buffer receive queue for incoming serial packets. This double buffer queue is intended to operate in a FIFO manner.

SerialDispatcherP is implemented as two major parts. The first part operates at the async (interrupt) level and is responsible buffer allocation, byte reception, buffer completion. The second part is responsible for packet delivery and operates at the task level.

Two buffer slots are provided and this implements the double buffering. One buffer can be in delivery while another buffer is receiving an incoming packet. Under heavy loads both buffers will be in delivery and new incoming data should be discarded.

Buffers handed off to the task layer for delivery should be handled in a strictly FIFO order.

The current implemenation of SerialDispatcherP (rev 5204) only provides a single set of state information and can only convey information about one buffer slot to the delivery task.

What steps will reproduce the problem?

Single step the code and set up a scenerio where the 2nd buffer is signalled complete prior to the receiveTask running (key variable is receiveTaskPending).

This problem exists because there isn't sufficient state to properly run the 2 slot FIFO queue in strict FIFO order while maintaining proper mutual exclusion.

This problem is a race condition. If the receiveTask executes (and clears receiveTaskPending) prior to a new packet being signalled as available then no problem occurs.

What is the expected output? What do you see instead?

1st packet arrives and completion is signalled to SerialDispatcherP. Buffer is handed off to receiveTask and receiveTaskPending set TRUE. This opens the window. Assume that prior to delivery being complete by receiveTask (clearing of receiveTaskPending), another packet is received and signalled.

Following problems occur:

1) Since there is only one set of state vars for handing information to receiveTask, meta info about the new packet is lost (type, which buffer, etc).

2) The packet in the 2nd buffer is never processed (hence lost). The receiveTask has no way of knowing that another buffer is available.

3) The buffer becomes unavailable for any future packets. It is never freed again and the system drops to one available buffer.

What version of the product are you using? On what operating system?

TinyOS-main SVN rev 5204 (hinrg git: c143598), 10/15/2010

Please provide any additional information below.

Original issue: http://code.google.com/p/tinyos-main/issues/detail?id=2

tinyos-issues commented 11 years ago

From mmar...@gmail.com on October 18, 2010 02:54:58 Hi! Can you please check if this reimplementation of the serial stack works for your load?

http://szte-wsn.cvs.sourceforge.net/viewvc/szte-wsn/tinyos/tos/lib/FastSerial/

(You do not need the Atm128UartP.nc and Msp430UartP.nc files, those will just make stuff a little faster by eliminating multi byte sends).

Miklos

tinyos-issues commented 11 years ago

From cire...@gmail.com on October 18, 2010 04:11:29 I've looked at that code and am unclear of how to use it to replace SerialDispatcherP/SerialP/HdlcTranslate. Not use instead of but to replace the functionality of to be backward compatible with existing code.

It is structurally very different than what SerialDispatcherP is trying to do.

Rather I'm fixing SerialP and SerialDispatcherP and have proposed code which I will be posting shortly. In fact I'd be pleased if you'd review and test it.

tinyos-issues commented 11 years ago

From mmar...@gmail.com on October 18, 2010 20:45:37 You just have to copy all files there to a directory and include that in your makefile. It replaces the SerialDispatcher code and gives some completely different implementation below that.

Anyways, if you can fix the current stack with little changes, then it is fine.

Best, Miklos

tinyos-issues commented 11 years ago

From cire...@gmail.com on October 20, 2010 10:08:50 Attached is proposed code to fix the SerialDispatcherP problems listed above.

This code fully implements a two slot FIFO protected buffering system. It includes appropriate state variables to insure that packet are received and delivered appropriately.

This code has been extensively unit tested using GDB/JTAG and a msp430F2618 mote. All identified corner cases have been examined by working through the code while single stepping.

This code should be used in conjuction with SerialP corrections contained with Issue 1.

tinyos-issues commented 11 years ago

From schip...@etantdonnes.com on October 27, 2010 18:47:07 TOS Serial code review Michael Schippling Oct 26, 2010

As per instruction from Eric Decker I have examined the Receive thread through these two files which are candidates for checkin to the TOS2 project. The "new" versions were on the serial_rx_fix branch of the tinyos-2.x tree and were compared to the "old" versions which are on the main branch: SerialP_nc_new.txt SerialDispatcherP_nc_new.txt

1) Presuming that we are doing Software Engineering this code should reference the appropriate TEP 113 document trail which provides a Functional Specification. The two files in question implement the Protocol and Dispatcher components of the standard serial stack for TOS2 as described in the TEP. The associated Requirements, Design and Test Plan documents are well enough hidden that I can't find them.

2) In the absence of Design documentation I would like to see comments in the source files describing the state machines and transitions related to the state variables: rslot_t.rslot_state, rx_state, and sendState in SerialDispatcherP and rxState, txState, and tx_buf_t.state in SerialP I believe this would make it easier for those who come after us to understand the code's function.

3) Just a comment on the partitioning of the higher level components... Since the message_t headers are protocol dependent and are accessed at multiple levels, would it have been "better" to indicate the header type in the overall message protocol byte -- labeled "Protocol byte" in section 3.6 of the TEP -- and handle it all in SerialP, rather than splitting the reformatting between SerialP and SerialDispatcher? I think this would also make swapping around from radio to serial message paths somewhat easier.

4) in SerialP, general RX buffer management: a) I think you can eliminate the +1 from the char buffer in rx_buf_t at line 133, and change the tests in push and pop from > to == at lines 311 and 323. b) rx_buffer_is_full() and rx_buffer_is_empty() are not used. c) One might be able to say the same thing about the TX buffers.

5) In SerialP rx_state_machine(...) starting at line 433: a) I am not comfortable with having multiple return paths from the switch(rxState){}, especially because the normal "break" path is for errors and the "all done" path is handled with a goto. Adding an error-state local variable might appease me... b) I got lost at line 487: if (rxByteCnt < 2 || rx_current_crc() != rxCRC) because I didn't initially understand that one needs to remain two bytes behind in order to calculate the CRC correctly. A comment to this effect and a define like CRC_BYTE_LENGTH in place of the 2 here and on line 487 would make the whole 2 byte buffering thing much clearer. c) I believe the rxInit() at line 492 could be moved to the "done:" section of the function, making the call at line 518 redundant. This would make it clearer that the state machine is indeed reset and ready to roll again. d) The whole "offPending" thing at line 522 looks fishy, but I guess it's needed to be able to shutdown cleanly. I had trouble following the logic, so maybe another state-transition comment is in order. e) And possibly a comment at line 529 pertaining to how multiple SYNC delimiters between messages are optional. This is covered in the HDLC RFC1662 document but it's a long trail to follow...

6) In SerialDispatcherP in general: a) This comment at line 132 seems dis-ingenuous in light of there being three levels of function calls and buffer manipulations per byte received:

...to minimize the overhead of structure dereference... c) I have no idea what this COUNT_NOK... syntax at lines 136 and 137 does: uint8_t* COUNT_NOK(sizeof(message_t)) rx_buffer; d) The ReceiveBytePacket event functions could be coalesced into a single state machine function with multiple calls as done in SerialP circa line 411. This would keep all the transitions in one place.

7) in SerialDispatcherP rxTask() starting at line 270: a) Is it guaranteed OK to do a return from an atomic{} block? b) You may be able to put everything but the check for FULL at line 278 outside the atomic. Once it's established that the buffer is FULL no one else should be touching it, no? c) Since only this function manipulates C_rx_slot, I think line 306 which re-gets the rs slot pointer is redundant. d) At line 304: msg = signal Receive.receive[proto](msg, msg, size); I presume that the receive() is supposed to return the msg buffer that it got -- as with T1's radio.receive(). If it doesn't you end up over-writing your rs->msg buffer at line 307 with what may be garbage. Doing it this way allows the receive() user some control over buffering but there could be consequences. e) Again I think the only thing that really needs to be in the atomic block is the check for FREE at line 313.

8) in SerialDispatcherP ReceiveBytePacket.startPacket() starting at line 330: a) All the dispatcherr* variables are not ever used or reset... b) Line 346: rx_index = 0; is redundant because rx_index is reset later in byteReceived() at line 360. c) Some more careful analysis (than I'm capable of) of what needs to be in the atomic could be done.

9) in SerialDispatcherP ReceiveBytePacket.byteReceived(...) starting at line 353: a) Again having returns in the switch makes me nervous. c) And again, some more careful analysis (than I'm capable of) of what needs to be in the atomic could be done.

10) in SerialDispatcherP ReceiveBytePacket.endPacket(...) starting at line 400...see #9, E.g., all the P_rx_slot manipulation happens in this function and I think overlapping references are prevented by the lower level access pattern.

tinyos-issues commented 11 years ago

From philip.l...@gmail.com on October 27, 2010 21:56:54 This fix has one of the same concerns as those in issue 1: use of enums waste memory.

This seems like a pretty subtle bug. The current implementation is such that if you receive a second packet before the first is signaled, you discard it. The assumption in the code is that you want to keep this buffer free in case a new packet comes in. Clearly a better way would be to discard only on handling startPacket for a new, third buffer. But this raises a more fundamental question: should the queue be FIFO, drop-tail, or should it prefer newer packets?

tinyos-issues commented 11 years ago

From cire...@gmail.com on October 28, 2010 09:03:19 SerialDispatcherP implements a double buffer receive queue that implements a FIFO queue. The premise of the bug report states that both buffers are full and that is what precipitates the problem.

However, the current implementation never queues up two buffers rendering the premise invalid.

The proposed code does implement a two element FIFO that can be easily extended to multiple slots. With two elements this implements the discard on startPacket for the third to be received mentioned above.

tinyos-issues commented 11 years ago

From philip.l...@gmail.com on November 13, 2010 01:13:37 Eric -- your most recent email suggests this is isn't a problem. Can you verify?

tinyos-issues commented 11 years ago

From cire...@gmail.com on November 13, 2010 01:44:03 This issue can be closed. The analysis of the problem involved a premise that isn't true:

"This problem will only cause a failure iff there are two buffers at the SerialDispatcherP layer waiting to be delivered."

SerialDispatcherP does not allow for two packets to be pending at the same time rendering the analysis moot.

Since two buffers can't ever be pending, there is no need for state information about both buffers.

tinyos / tinyos-main

SerialDispatcherP can lose packets, loses double buffering, and doesn't implement FIFO correctly #5