Open leiyu-bytedance opened 3 years ago
The issue occurs when there are mutliple writes to the kcs device wihtout read. From the kcs driver, it shows that if one write to the kcs device, it can NOT be written again:
1 if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
2 kcs_bmc->phase = KCS_PHASE_READ;
3 kcs_bmc->data_out_idx = 1;
4 kcs_bmc->data_out_len = count;
5 memcpy(kcs_bmc->data_out, kcs_bmc->kbuffer, count);
6 write_data(kcs_bmc, kcs_bmc->data_out[0]);
7 ret = count;
8 } else {
9 ret = -EINVAL;
10 }
It can also happen if the host decides to reset the state machine before the bridge finishes writing, but before a new command is sent down the channel and read().
I fixed some edge cases around this behavior in a rewrite of the bridge https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/src/main.cpp#L127
However, I still haven't made the EINVAL error message more clear as a normal (non-exceptional) error case that the host can direct.
It can also happen if the host decides to reset the state machine before the bridge finishes writing, but before a new command is sent down the channel and read().
I fixed some edge cases around this behavior in a rewrite of the bridge https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/src/main.cpp#L127
However, I still haven't made the EINVAL error message more clear as a normal (non-exceptional) error case that the host can direct.
Good to know there is another implementation of kcsbridge.
Comparing the code, it looks like:
@wak-google Could you kindly tell which the edge cases are? And will you push Google's re-write of kcsbridge to the community?
We have verified that the rewrite of kcsbridge in https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/ works fine. This issue is not reproduced.
I would propose that google push the re-write upstream :)
The kcsbridge gets
Failed to send rsp msg
error intermittently, where the return value is 0 and theERROR
is "Invalid argument"The full json-pretty output is: