Open BLuedtke opened 1 year ago
Admittedly, I'm a bit confused as to why the check for presence of addr_stack
in stall_affected_nodes_queue
is negated.
With what happens in the if-body, aka the addr_stack
being added to said queue, a second call to bidib_node_stall_ready
will return true, even though the node is still stalled - since the address is now present in the queue.
Only one thing prevents bidib from (incorrectly) sending a message over a stalled node: All invocations of bidib_node_stall_ready
also check that the node's message queue is empty. However, in function bidib_node_try_send
, a message is added to this queue when the node is stalled (i.e., on the first call to bidib_node_stall_ready
where it'll return false as expected). In the next attempt to send a message (via bidib_node_try_send
), the node is recognized as "not ready" purely because the message queue is not empty.
I can only guess the original intention. One possibility is that this should've been two checks: One check for the "stall" member of the node_state. Then, if the node is stalled, but the address of the stalled node or supernode is not yet present in the stall_affected_nodes_queue
, add the address to the queue.
Reference to MSG_STALL in the bidib protocol: http://bidib.org/protokoll/bidib_general_messages.html
This line looks suspicious, should probably be message[data_index]
:
https://github.com/uniba-swt/libbidib/blob/9f3942731501f0c70d433886098d5e0b0c9d4bab/src/transmission/bidib_transmission_receive.c#L353
The idea of stall_affected_nodes_queue
was probably to implement a way to check whether a super-node is has been stalled when trying to send a command to one of its sub-nodes.
When a MSG_STALL
is received, bidib_node_update_stall(addr_stack, stall_status)
is called. The addr_stack
contains the address of the node that sent MSG_STALL
and also the addresses of its super-nodes. Because addr_stack
is a fixed array of size 4, the unused elements above the top of the stack are represented by 0x00
. If a node determines that it needs to stall to avoid overwhelming its sub-nodes, then we should pause any sending of commands to the node and its sub-nodes.
Take the function discussed above in https://github.com/uniba-swt/libbidib/issues/28#issue-1621654175: https://github.com/uniba-swt/libbidib/blob/9f3942731501f0c70d433886098d5e0b0c9d4bab/src/transmission/bidib_transmission_node_states.c#L108-L130
When a stalled node is found, the original addr_stack
is pushed into the node's stall_affected_nodes_queue
: https://github.com/uniba-swt/libbidib/blob/9f3942731501f0c70d433886098d5e0b0c9d4bab/src/transmission/bidib_transmission_node_states.c#L119
Otherwise, we reach the for-loop that zeros out the top of the addr_cpy
. The copy of the address stack now addresses a super-node closer to the root node (addr_cpy[0]
is the root node):
https://github.com/uniba-swt/libbidib/blob/9f3942731501f0c70d433886098d5e0b0c9d4bab/src/transmission/bidib_transmission_node_states.c#L122-L127
The outer while-loop repeats until a stalled node is found or all nodes in addr_stack
have been considered.
As a result, the stall_affected_nodes_queue
of a node may contain an address to itself and/or addresses to its sub-nodes. When a node is no longer stalled, we should try and resume the sending of commands to its sub-nodes and itself:
https://github.com/uniba-swt/libbidib/blob/9f3942731501f0c70d433886098d5e0b0c9d4bab/src/transmission/bidib_transmission_node_states.c#L232-L237
When sending a message via a node, first bidib determines if the node is ready to send.
This involves, among other things, checking whether the node is stalled, or, whether a supernode of the node is stalled. The stall check is performed in a function here.
If the node or a supernode is stalled, the linked function returns false. A node is stalled if the member "stall" is true and the address_stack (input param to linked fct) is NOT contained in the member stall_affected_nodes_queue.
Thus, the linked fct attempts to find the addr_stack in the queue stall_affected_nodes_queue by using g_queue_find, see line 114. As the addr_stack is an array of 4 uint8_t's, a custom comparison function is used, as the comparison shall be element-wise, not a pointer comparison. This comparator is called
bidib_node_stall_queue_entry_equals
(see line 99ff.).The call to
g_queue_find
looks as follows:!g_queue_find(state->stall_affected_nodes_queue, bidib_node_stall_queue_entry_equals)
.Note that the element to find is not actually given here! This code just always returns true, the comparator is never called.
g_queue_find
is supposed to be called with the queue and the element to find (see here). To use a custom comparator, we need to useg_queue_find_custom
(see here). It shall be called like this:!g_queue_find_custom(state->stall_affected_nodes_queue, addr_stack, (GCompareFunc)bidib_node_stall_queue_entry_equals)
.The unit tests related to this part (bidib_send_tests) pass as the "stall" member is apparently sufficient to indicate whether the node is stalled or not. However, there's probably a reason for the existence of the stall_affected_nodes_queue, so this comparison call should be fixed.