We are seeing a few different types of errors originating from the iPerf test in the Netrics code. Some of these errors are expected, such as the error that indicates that the test server was busy and that the device initiating the test will back off and try again. (i.e., 2022-07-22T14:51:04.776 ERROR {netson} [iperf3_bandwidth] 1 / 4: iperf3: error - the server is busy running a test. try again later)
However, we are also catching errors that are not explained by the test serving being busy with running another test from another device. These errors include:
2022-07-22T01:31:10.735 ERROR {netson} [iperf3_bandwidth] iperf3: error - unable to send control message: Connection reset by peer (see, e.g., nm-mngd-20210927-d8cfa28a)
2022-07-22T02:50:10.245 ERROR {netson} [iperf3_bandwidth] iperf3: error - unable to send control message: Broken pipe (see, e.g., nm-mngd-20220505-7db0ce31)
2022-07-22T03:02:23.926 ERROR {netson} [iperf3_bandwidth] iperf3: error - control socket has closed unexpectedly (see, e.g., nm-mngd-20210924-872de77e)
2022-07-22T03:14:56.191 ERROR {netson} [iperf3_bandwidth] iperf3: error - unable to read from stream socket: Resource temporarily unavailable (see, e.g., nm-mngd-20210915-8bb64c9a)
Are these errors also being caused by the test server receiving multiple requests to conduct a test from multiple devices simultaneously? Can we treat them similarly to how we treat the server is busy, try again later error? Or is there something else going on here that we should look into?
The "server is busy" message is a typical message of a single server receiving multiple connections requests.
The other messages sounds like a problem with a server socket. I know Jamie has a script running a reset every once in a while, maybe this has something to do about this, maybe not.
The lock-in of iperf servers were noticed, iirc, after oplat was deployed. We need to understand if the oplat use of the iperf lib could cause any unintentional problems to the server.
In general, more servers are required, and if this is important some investment might be required.
We are seeing a few different types of errors originating from the iPerf test in the Netrics code. Some of these errors are expected, such as the error that indicates that the test server was busy and that the device initiating the test will back off and try again. (i.e.,
2022-07-22T14:51:04.776 ERROR {netson} [iperf3_bandwidth] 1 / 4: iperf3: error - the server is busy running a test. try again later
)However, we are also catching errors that are not explained by the test serving being busy with running another test from another device. These errors include:
Are these errors also being caused by the test server receiving multiple requests to conduct a test from multiple devices simultaneously? Can we treat them similarly to how we treat the
server is busy, try again later
error? Or is there something else going on here that we should look into?