I found 2 issues that concerns the function _send_and_receive in interfaces/rmcp.py.
rx_data could be not assigned
What's wrong
If an IPMI device timed out multiple times (its firmware has an issue, it becomes not available on the network for whatever reason ...), a socket.timeout exception will be raised during self._send_ipmi_msg(tx_data) and we increment the retry counter.
Now if we do this until retry > self.max_retries, we will get out of the while loop and the rx_data will not be set, resulting in an exception during the final return rx_data[6:-1].
Moreover, we will loose the information that the IPMI device was unreachable because it timed out multiple times
Potential fix proposition
When we get out of the loop, we can check if retry > self.max_retries and raise a sort of MaxRetryExceeded or Timeout exception that can be catch elsewhere if needed.
We can also add some details in an "error" variable if we get inside the except socket.timeout block, and include this error variable inside the MaxRetryExceeded details at the end if we raise it.
errors = ""
...
except socket.timeout as exc:
retry += 1
errors += "error in _send_and_receive: " + exc.__str__()
if retry > self.max_retries:
raise MaxRetryExceeded("custom exception", errors)
return rx_data[6:-1]
The transaction can infinite loop
What's wrong
I currently have a weird bug during a heavy load on pyipmi library where some of my IPMI requests results in a NetFn mismatch, sequence number mismatch and command id mismatch at the same time. The issue is not about having these mismatch, but the behavior of the transaction retry while received is False: :
we assign the variable rx_data with the received IPMI message
we check for data integrity in received = rx_filter(header, rx_data) -> assume it fails because of the misatches
we put back rx_data on the queue in self._q.put(rx_data)
we are back at the beginning of the loop, with rx_data being the exact same as before because we get it from the queue this time
rx_filter check will fail again because neither header nor rx_data are different from before
... -> infinite loop
Potential fix proposition
This while loop with the Queue workflow will only work for bridged message, any other message will not work because rx_data will be the same at every iteration. If the check rx_filter() fail once, it will fail every time forever.
We can add a counter just like self.max_retries and raise an exception if we exceed it, this way we will stop at a given time.
We can also say that a message that is not a bridged message should not fail the rx_filter() check, thus we can immediately raise an exception :
if received is False and array('B', rx_data)[5] != constants.CMDID_SEND_MESSAGE:
raise Exception("custom exception here")
I'm totally willing to contribute to this if needed, just not 100% sure what's the best approach / how do you want to address these. I'm also aware that these are quite hard to reproduce but I hope my explanations were clear enough to "see" the bug without having to test it.
https://github.com/kontron/python-ipmi/blob/8e83f6f1de1c774cc9bdce90955b9917674351d4/pyipmi/interfaces/rmcp.py#L590-L618
I found 2 issues that concerns the function
_send_and_receive
ininterfaces/rmcp.py
.rx_data
could be not assignedWhat's wrong
If an IPMI device timed out multiple times (its firmware has an issue, it becomes not available on the network for whatever reason ...), a
socket.timeout
exception will be raised duringself._send_ipmi_msg(tx_data)
and we increment theretry
counter.Now if we do this until
retry > self.max_retries
, we will get out of the while loop and therx_data
will not be set, resulting in an exception during the finalreturn rx_data[6:-1]
.Moreover, we will loose the information that the IPMI device was unreachable because it timed out multiple times
Potential fix proposition
When we get out of the loop, we can check
if retry > self.max_retries
and raise a sort ofMaxRetryExceeded
orTimeout
exception that can be catch elsewhere if needed.We can also add some details in an "error" variable if we get inside the
except socket.timeout
block, and include this error variable inside theMaxRetryExceeded
details at the end if we raise it.The transaction can infinite loop
What's wrong
I currently have a weird bug during a heavy load on pyipmi library where some of my IPMI requests results in a
NetFn mismatch
,sequence number mismatch
andcommand id mismatch
at the same time. The issue is not about having these mismatch, but the behavior of the transaction retrywhile received is False:
:rx_data
with the received IPMI messagereceived = rx_filter(header, rx_data)
-> assume it fails because of the misatchesrx_data
on the queue inself._q.put(rx_data)
rx_data
being the exact same as before because we get it from the queue this timerx_filter
check will fail again because neitherheader
norrx_data
are different from beforePotential fix proposition
This while loop with the Queue workflow will only work for bridged message, any other message will not work because
rx_data
will be the same at every iteration. If the checkrx_filter()
fail once, it will fail every time forever.We can add a counter just like
self.max_retries
and raise an exception if we exceed it, this way we will stop at a given time. We can also say that a message that is not a bridged message should not fail therx_filter()
check, thus we can immediately raise an exception :I'm totally willing to contribute to this if needed, just not 100% sure what's the best approach / how do you want to address these. I'm also aware that these are quite hard to reproduce but I hope my explanations were clear enough to "see" the bug without having to test it.