openbmc / kcsbridge

Apache License 2.0
3 stars 4 forks source link

Failed to send rsp msg intermittently #1

Open leiyu-bytedance opened 3 years ago

leiyu-bytedance commented 3 years ago

The kcsbridge gets Failed to send rsp msg error intermittently, where the return value is 0 and the ERROR is "Invalid argument"

The full json-pretty output is:

{
        "CODE_LINE" : "76",
        "_BOOT_ID" : "fc13807c5e13439983e879841967cb6a",
        "_SYSTEMD_INVOCATION_ID" : "d47c47ccf8654964beaea2e3b7962eb3",
        "_COMM" : "kcsbridged",
        "TRANSACTION_ID" : "65467255",
        "__CURSOR" : "s=61672c7be0e848ec99ac8c10313b1bc6;i=11fc;b=fc13807c5e13439983e879841967cb6a;m=2e3a80a8d;t=5bee385a56da0;x=b2bef7d1394a65f0",
        "ERROR" : "Invalid argument",
        "CMD" : "0x23",
        "_SYSTEMD_SLICE" : "system-phosphor\\x2dipmi\\x2dkcs.slice",
        "_MACHINE_ID" : "491b3315fc3749a0becbab8778243aff",
        "_CAP_EFFECTIVE" : "1ffffffffff",
        "SIZE" : "0",
        "__REALTIME_TIMESTAMP" : "1617259292618144",
        "LUN" : "0x00",
        "_TRANSPORT" : "journal",
        "__MONOTONIC_TIMESTAMP" : "12409375373",
        "SYSLOG_IDENTIFIER" : "kcsbridged",
        "_SYSTEMD_UNIT" : "phosphor-ipmi-kcs@ipmi_kcs3.service",
        "_EXE" : "/usr/bin/kcsbridged",
        "_SYSTEMD_CGROUP" : "/system.slice/system-phosphor\\x2dipmi\\x2dkcs.slice/phosphor-ipmi-kcs@ipmi_kcs3.service",
        "_UID" : "0",
        "_CMDLINE" : "/usr/bin/kcsbridged -c ipmi_kcs3",
        "EXPECT" : "10",
        "CC" : "0x00",
        "PRIORITY" : "3",
        "_GID" : "0",
        "CODE_FUNC" : "helper_log",
        "_HOSTNAME" : "g220a",
        "CODE_FILE" : "/usr/src/debug/phosphor-ipmi-kcs/1.0+gitAUTOINC+4a4d1d03d9-r1/recipe-sysroot/usr/include/phosphor-logging/log.hpp",
        "_PID" : "896",
        "_SOURCE_REALTIME_TIMESTAMP" : "1617259292617848",
        "MESSAGE" : "Failed to send rsp msg",
        "NETFN" : "0x0b"
}
leiyu-bytedance commented 3 years ago

The issue occurs when there are mutliple writes to the kcs device wihtout read. From the kcs driver, it shows that if one write to the kcs device, it can NOT be written again:

  1     if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
  2         kcs_bmc->phase = KCS_PHASE_READ;
  3         kcs_bmc->data_out_idx = 1;
  4         kcs_bmc->data_out_len = count;
  5         memcpy(kcs_bmc->data_out, kcs_bmc->kbuffer, count);
  6         write_data(kcs_bmc, kcs_bmc->data_out[0]);
  7         ret = count;
  8     } else {
  9         ret = -EINVAL;
 10     }
wak-google commented 3 years ago

It can also happen if the host decides to reset the state machine before the bridge finishes writing, but before a new command is sent down the channel and read().

I fixed some edge cases around this behavior in a rewrite of the bridge https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/src/main.cpp#L127

However, I still haven't made the EINVAL error message more clear as a normal (non-exceptional) error case that the host can direct.

leiyu-bytedance commented 3 years ago

It can also happen if the host decides to reset the state machine before the bridge finishes writing, but before a new command is sent down the channel and read().

I fixed some edge cases around this behavior in a rewrite of the bridge https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/src/main.cpp#L127

However, I still haven't made the EINVAL error message more clear as a normal (non-exceptional) error case that the host can direct.

Good to know there is another implementation of kcsbridge.

Comparing the code, it looks like:

@wak-google Could you kindly tell which the edge cases are? And will you push Google's re-write of kcsbridge to the community?

leiyu-bytedance commented 3 years ago

We have verified that the rewrite of kcsbridge in https://github.com/openbmc/google-misc/blob/master/subprojects/kcsbridge/ works fine. This issue is not reproduced.

I would propose that google push the re-write upstream :)

wak-google commented 3 years ago

Fixed via https://github.com/openbmc/kcsbridge/commit/03e6defcbca99c9c00cd37c4afb1d2b415a92acd