https://github.com/kontron/python-ipmi/blob/8e83f6f1de1c774cc9bdce90955b9917674351d4/pyipmi/interfaces/rmcp.py#L590-L618

I found 2 issues that concerns the function _send_and_receive in interfaces/rmcp.py.

`rx_data` could be not assigned

What's wrong

If an IPMI device timed out multiple times (its firmware has an issue, it becomes not available on the network for whatever reason ...), a socket.timeout exception will be raised during self._send_ipmi_msg(tx_data) and we increment the retry counter.

Now if we do this until retry > self.max_retries, we will get out of the while loop and the rx_data will not be set, resulting in an exception during the final return rx_data[6:-1].

Moreover, we will loose the information that the IPMI device was unreachable because it timed out multiple times

Potential fix proposition

When we get out of the loop, we can check if retry > self.max_retries and raise a sort of MaxRetryExceeded or Timeout exception that can be catch elsewhere if needed.

We can also add some details in an "error" variable if we get inside the except socket.timeout block, and include this error variable inside the MaxRetryExceeded details at the end if we raise it.

    errors = ""
    ...
    except socket.timeout as exc:
        retry += 1
        errors += "error in _send_and_receive: " + exc.__str__()

if retry > self.max_retries:
    raise MaxRetryExceeded("custom exception", errors)

return rx_data[6:-1]

The transaction can infinite loop

What's wrong

I currently have a weird bug during a heavy load on pyipmi library where some of my IPMI requests results in a NetFn mismatch, sequence number mismatch and command id mismatch at the same time. The issue is not about having these mismatch, but the behavior of the transaction retry while received is False: :

we assign the variable rx_data with the received IPMI message
we check for data integrity in received = rx_filter(header, rx_data) -> assume it fails because of the misatches
we put back rx_data on the queue in self._q.put(rx_data)
we are back at the beginning of the loop, with rx_data being the exact same as before because we get it from the queue this time
rx_filter check will fail again because neither header nor rx_data are different from before
... -> infinite loop

Potential fix proposition

This while loop with the Queue workflow will only work for bridged message, any other message will not work because rx_data will be the same at every iteration. If the check rx_filter() fail once, it will fail every time forever.

We can add a counter just like self.max_retries and raise an exception if we exceed it, this way we will stop at a given time. We can also say that a message that is not a bridged message should not fail the rx_filter() check, thus we can immediately raise an exception :

if received is False and array('B', rx_data)[5] != constants.CMDID_SEND_MESSAGE:
    raise Exception("custom exception here")

I'm totally willing to contribute to this if needed, just not 100% sure what's the best approach / how do you want to address these. I'm also aware that these are quite hard to reproduce but I hope my explanations were clear enough to "see" the bug without having to test it.

kontron / python-ipmi

Transactions retry unassigned variable and infinite loop #159

`rx_data` could be not assigned

What's wrong

Potential fix proposition

The transaction can infinite loop

What's wrong

Potential fix proposition

kontron / python-ipmi

Transactions retry unassigned variable and infinite loop #159

rx_data could be not assigned

What's wrong

Potential fix proposition

The transaction can infinite loop

What's wrong

Potential fix proposition

`rx_data` could be not assigned