tumblr / genesis

A tool for data center automation
http://tumblr.github.io/genesis/
Apache License 2.0
155 stars 24 forks source link

Genesis run by init, terminals can tail the log #67

Closed nshalman closed 7 years ago

nshalman commented 7 years ago

Fixes #66

With this change applied, genesis kindly puts shells on the serial consoles.

byxorna commented 7 years ago

So, this needs testing, because of some interesting decisions that were made about how to launch Genesis bootloader on start. We have an auto login script, and Genesis is launched from a .bashrc or .profile iirc, with some locking to prevent multiple concurrent runs. With this change, will all the ttys attempt to log in and run the .profile? This is undesirable imho, and we should figure out how to make only a single instance launch, not tied to a (v)tty.

nshalman commented 7 years ago

@byxorna I have a bad feeling that you're correct that they all log in in a way that would be undesirable as you describe.

byxorna commented 7 years ago

Yea... I've been wanting to fix how Genesis bootloader launches, so this may be a good opportunity to do so. If we can get upstart to launch it, and still get output sent to vtty1 for operator troubleshooting, that would be perfect. No more interactive login hijinks

nshalman commented 7 years ago

Ideally the output should go to a runtime-configurable set of terminals. Some operators might want to watch the output over IPMI SoL or have it go to both SoL and the "VGA" so that it can be monitored from either depending on who's watching.

Perhaps the auto-login either drops to a shell, or tails a log file where the upstart launched automated run is logging?

I'll keep poking around the guts to see how much I can grok the automated bits.

byxorna commented 7 years ago

Yea, I like it! Too much of Genesis isn't easily overridable​ by admins making their own spin without getting into the guts. An easy override via kernel command line and /etc file would be dope. If you can figure out an elegant way to do this, that would be awesome!!

jyundt commented 7 years ago

Ideally the output should go to a runtime-configurable set of terminals. Some operators might want to watch the output over IPMI SoL or have it go to both SoL and the "VGA" so that it can be monitored from either depending on who's watching.

Could we make this default to "all consoles"? (Assuming the tasks in .bashrc/.profile can be extracted and executed only once.)

It would be handy to have the console available "everywhere" (VGA, SOL, etc) by default instead of having to override a bootarg + reboot if the console isn't where you expect it at first.

Yea, I like it! Too much of Genesis isn't easily overridable​ by admins making their own spin without getting into the guts. An easy override via kernel command line and /etc file would be dope.

:thumbsup:

nshalman commented 7 years ago

I have a working demo of using an init script to run the genesis bootloader under screen and a root profile that attaches to the screen session. If that is of interest I will push my new code for you lovely folks to review.

nshalman commented 7 years ago

Well, screen was heavy handed (as @jyundt mentioned to me privately.) The init script runs a wrapper which drops a pid file and redirects stdout and stderr to the log file before running the bootloader. All terminals log in and tail the log file until the pid obtained from the pidfile exits.

nshalman commented 7 years ago

One last force push included some cleanup after I'd tested it (don't ask how many cycles of typo fixing and rebuilds I went through...)

This is looking very nice to me. Feedback requested!

nshalman commented 7 years ago

Thanks for the feedback. I'll work on addressing your comments. I also found some odd behavior based on what console= flags are or aren't passed on the kernel command line that I'm trying to clean up. There's some other simplification that will happen as a result of that cleanup.

byxorna commented 7 years ago

@nshalman great, looking forward to it!

nshalman commented 7 years ago

I'm pushing a bunch of individual commits that address the many pieces of feedback I have received.

I believe this should all get squashed down to a single commit before merging, but this was cleaner for me to work on while addressing the feedback.

nshalman commented 7 years ago

I'll squash this all down to a single commit in the next day or two.

roymarantz commented 7 years ago

this is a great improvement over the current method. Why have the autologin of any ttys though? That was done as a trigger for starting genesis-bootloader, but since it is started by init that should not be needed. I'm even sure I'd have root's bashrc tail the log, but instead echo the command that would do that. i.e. I've gotten used to having to run a command after connecting, but as long as I could ^-C out of the tail it would be OK. My practice is usually to only watch the output when I need to interact with the server.

byxorna commented 7 years ago

@roymarantz operators use the default log tailing behavior to aid debugging and to quickly check in during induction without screwing with a keyboard

roymarantz commented 7 years ago

Even though it isn't my preference or use case, I'm OK with the auto tailing, but am uneasy with autologined connections being able to write. i.e. reading would be OK, but you should have to login before the terminal listens to you

byxorna commented 7 years ago

@roymarantz I agree with your hesitation here. Ideally, we would have a clean way to echo the log output straight to the TTY without any kind of shady autologin or similar. I don't really have a better idea than what we have currently though... Unless we configured Genesis to log with a specific syslog facility (i.e. USER7) and configured rsyslog statically to ship all logs tagged thusly to /dev/console as well as the normal log location (instead of relying on output redirection from the invoking bootstrap process). Now that I write it out... It sounds a bit more robust, no? Cc @nshalman

nshalman commented 7 years ago

The code has been squashed down. I'm unclear on how to handle the feedback from @roymarantz. As far as I can tell I've pretty accurately replicated the existing behavior as perceived by an operator with the improvement that it does the right thing with the serial port when I pass in the right console= settings on the kernel command line.

If there is a desire to implement a new feature where the behavior is different from the current behavior that is enabled by the fact that the bootloader is now elegantly launched by init rather than hackishly on a single TTY, that seems like a reasonable request to make, but I'm not sure it should block this incremental improvement.

byxorna commented 7 years ago

@nshalman I agree, it sounds outside the scope of this PR. It would be really nice to have though, so we should open a new ticket to track the move to a more robust system for showing Genesis behavior on console. Perhaps suggestions could be collected into a GH issue for this to be discussed further? @roymarantz do you concur?

nshalman commented 7 years ago

@byxorna, @roymarantz: Is there any more feedback that needs to be addressed before this can land?

byxorna commented 7 years ago

@nshalman i dont have the magic wizarding powers to perform arcane merges, so we need to wait for @roymarantz to land.

roymarantz commented 7 years ago

@nshalman sorry, but the more I think about it the more I can't have auto-login in my environment. If there were some way that could be optional (ideally with the default being not enabled) I would merge this.

I postulate that a boot option (visible in /proc/cmdline) would be one way to do it, alternatively a different boot image. Neither seem real appealing though. @byxorna can you come up with another way to do it?

Another possibility is have the init.d file spawn some background processes that tail -f the /var/log file, one for each console/tty. It shouldn't matter if any of the processes hang and I don't think they'll abort of the tty isn't connected.

nshalman commented 7 years ago

@roymarantz you currently have auto-login on /dev/tty1 (See here.) If the last task doesn't shut down or reboot, that tty gets dropped to a root shell.

I don't understand how this implementation is measurably worse in that regard.

Further improvement to change the default behavior to do something different will be simplified by the groundwork this change puts into place.

If I were adding auto-login where it wasn't already present I would understand your objection and would write further code before requesting a merge.

roymarantz commented 7 years ago

Interesting. I guess I'm "lucky" that it doesn't work in my environment. Does it work in yours?

nshalman commented 7 years ago

I thought about this, did some experiments, and finally hit upon a workable and relatively simple solution. The automatic login and log tailing only happens if the kernel command line contains the string GENESIS_AUTOTAIL

Note that this definitely changes the default behavior of the Genesis live environment on tty1.

nshalman commented 7 years ago

I'm again leaving these as separate commits for the moment for review, but they should be squashed down before being applied.

roymarantz commented 7 years ago

Thanks @nshalman 👍 Did you fill out a CLA agreement? (I have trouble checking right now)

And please squash the commit.

byxorna commented 7 years ago

@gtorre @roymarantz can you guys independently build and verify functionality of this before merging? There is no automated test coverage, so we need to manually verify specific usecases work (also, documenting these usecases in a TESTING.md file would be super helpful)

nshalman commented 7 years ago

Squashed. CLA is in-flight and should land within 48 hours (expected in less than 24, though.)

@byxorna do you want a TESTING.md to be part of this merge or can it be separate?

Also, github doesn't provide an interface for commenting on the commit message, but feedback on the current commit message if it could be clearer is also welcome.

nshalman commented 7 years ago

I have been using the Makefile pasted below to do my development and testing which I am happy to check in if it would be of use to others.

There are four run targets that I use to run the 4 tests to verify the test matrix of whether or not the serial port is active and whether or not GENESIS_AUTOTAIL was specified:

BUILDER=genesis-builder
CONTAINER=genesis
# Run QEMU connecting VM ttyS1 to stdio
QEMU=qemu-system-x86_64 --enable-kvm -smp 2 -m 2048 -kernel output/genesis-vmlinuz -initrd output/genesis-initrd.img -serial null -serial stdio -monitor null
BASE_ARGS=rootflags=loop root=live:/genesis.iso rootfstype=auto ro vga=791 rd_NO_LUKS rd_NO_MD rd_NO_DM GENESIS_MODE=util console=tty1

.PHONY: help clean run-serial run-serial-notail run-tty1 run-tty1-notail

help:
    @echo 'Did you mean "make output"?'

docker-image:
    sudo docker build -t $(BUILDER) .
    touch docker-image

output: docker-image
    mkdir $@
    sudo docker run --name $(CONTAINER) --privileged=true -v $(PWD)/$@:/output $(BUILDER)
    sudo chown -R $(USER) $@

clean:
    sudo losetup -a | cut -d: -f1 | xargs -r -n 1 sudo losetup -d
    sudo rm -rf output docker-image
    sudo docker rm $(CONTAINER) || true
    sudo docker rmi $(BUILDER) || true

run-serial: output
    # Note: ttyS1 must come last for the serial port to get agetty on it.
    @echo This test should result in automatic tailing of log on both tty1 and ttyS1
    $(QEMU) -append "$(BASE_ARGS) console=ttyS1 GENESIS_AUTOTAIL"

run-serial-notail: output
    # Note: ttyS1 must come last for the serial port to get agetty on it.
    @echo This test should result in login prompts on both tty1 and ttyS1
    $(QEMU) -append "$(BASE_ARGS) console=ttyS1"

run-tty1: output
    @echo This test should result in automatic tailing of log on tty1 only
    $(QEMU) -append "$(BASE_ARGS) GENESIS_AUTOTAIL"

run-tty1-notail: output
    @echo This test should result in login prompt on tty1 only
    $(QEMU) -append "$(BASE_ARGS)"
nshalman commented 7 years ago

@roymarantz the CLA has been sent.

roymarantz commented 7 years ago

Great. some final testing and I'll merge and cut a new release Thanks again. Roy

On Fri, Jun 30, 2017 at 9:04 AM, Nahum Shalman notifications@github.com wrote:

@roymarantz https://github.com/roymarantz the CLA has been sent.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tumblr/genesis/pull/67#issuecomment-312261616, or mute the thread https://github.com/notifications/unsubscribe-auth/AF2LSm9lkzOp06Xiy-6oiEtXHi1R_2erks5sJPJHgaJpZM4NlkW1 .

byxorna commented 7 years ago

@nshalman separate pr would be nice, to keep this scoped. Your makefile would be useful to have committed as well

nshalman commented 7 years ago

I've opened #72 with the Makefile and a TESTING.md.

roymarantz commented 7 years ago

I built an image and tested it on baremetal and it worked as expected.