openbmc / obmc-console

OpenBMC host console infrastructure
Apache License 2.0
17 stars 25 forks source link

"Error reading from tty device" message on startup. #6

Open williamspatrick opened 8 years ago

williamspatrick commented 8 years ago

I sometimes observe the following on startup. Due to the restart policy, we get the service immediately restarted and it works fine after that.

Sep 17 00:54:32 palmetto sh[679]: obmc-console-server: Error reading from tty device: Success
adamliyi commented 7 years ago

Committed a fix at: https://gerrit.openbmc-project.xyz/#/c/848/ This issue is not easy to reproduce. I can sometimes observe the issue when:

  1. Change Host CPU state (obmcutil poweroff/on)
  2. Reboot bmc But this will not always trigger it. When the issue happens, poll() tty device will return revents value: 0x19. (looks to be POLLERR | POLLHUP | POLLIN). In my patch, I tried to:
  3. continue poll() when revents not POLLIN
  4. When handling POLLIN, if read() returns 0, just continue without break.

@williamspatrick , @jk-ozlabs . Please review.

adamliyi commented 7 years ago

Updated the patch - when there is revent besides POLLIN, we cannot do continue, otherwise we cannot poll clients (call_pollers(console)) - clients may hang there.

Is it OK just read from tty when revent==POLLIN?

I tested on Palmetto, and I cannot reproduce the issue any more, and obmc-console-client works well.

geissonator commented 7 years ago

Needs 1 more +1 and then merge. Please follow up with reviewers.

adamliyi commented 7 years ago

Hi @shenki , will you have a look and +1 :)

adamliyi commented 7 years ago

@jk-ozlabs commented in: https://gerrit.openbmc-project.xyz/#/c/848/, that "Why are you ignoring POLLERR & POLLHUP? This seems dangerous." I don't have strong reason on "why ignoring POLLERR & POLLHUP", so abandoned that patch. Looks to me, we need to root cause why there is POLLERR and POLLHUP from serial port to fix this issue. But I currently have no idea. Since systemd can restart console-server so this is not a blocking issue. I'd leave this issue open for now.

geissonator commented 7 years ago

Seem pretty open now, and priority isn't clear so just putting in actual for current sprint (not committed).

geissonator commented 7 years ago

@adamliyi - Patrick would like to see this one get resolved, please follow up with Joel on a correct solution this next sprint. Thanks!

mine260309 commented 7 years ago

From @jk-ozlabs, we have several commits under review: the patch series of https://gerrit.openbmc-project.xyz/#/c/2316 It's better to re-test this issue after the patches are submitted.