Sandbox process continues emitting logs to `boot` logs after sandbox has booted and user container has started

jseba commented 1 week ago

Description

I was digging through some logs today and noticed that the gofer doesn't seem to be respecting the --debug-command boot,create option the way the sandbox does. It continues to emit debug logs well after the sentry has completed the create and boot steps and this causes a lot of noise when it's not expected that it should keep logging. I've made a change to our runtime to stop forwarding these logs after gvisor has started the guest process but I wasn't sure if it was intended that the gofer continue writing debug logs. I tried adding !gofer to the list but it didn't seem to have an effect on the output. Also, gvisor continues to print Time: Adjusting syscall overhead down to 7 the entire time as well, even after boot, so that's another component that doesn't seem to respect the debug command filters.

Note that we still have directfs disabled, as the performance benefits are not relevant to our workloads.

Steps to reproduce

Install gvisor runtime with debug options limited to --debug-command=boot,create and --directfs=false
Run a docker container and do some I/O
Inspect the debug logs and see that gofer still prints debug logs and the sampler continues printing debug logs as well

runsc version

`release-20241021.0`

docker version (if using docker)

uname

No response

kubectl (if using Kubernetes)

repo state (if built from source)

No response

runsc debug logs (if available)

ayushr2 commented 1 week ago

I can't repro this.

It continues to emit debug logs well after the sentry has completed the create and boot steps

Could you explain how you know the logs are being emitted by the gofer? Note that boot logs is not just logs from the boot step. It is the sandbox process logs. The boot logs will keep being written to as long as the sandbox is running.

A good way to verify which command is producing what logs is to use something like --debug-log=/tmp/logs/ (notice the trailing slash). For each docker run, a new file will be created for each command. So you will see entries like runsc.log.20241113-190225.428587.boot.txt and runsc.log.20241113-190225.428587.create.txt. When using --debug-command=boot,create, I only see the boot.txt and create.txt file.

You can inspect boot.txt and you will notice that the logs you are mentioning (which are being emitted at runtime) are actually coming from the boot command. The boot command boots up the sandbox and runs it. It is the sandbox process. Each command corresponds to a process. For example, the create.txt file corresponds to the invocation of the runsc create command which runs in its own process.

I don't think there is a bug with --debug-command.

jseba commented 1 week ago

Ah I see, so these are client side on the sandbox then, gotcha. I still think there's a bug that they should not be emitted after the guest process launches, as this is post-boot. From the name, I would expect boot to be only the part up to the part where control is handed to the guest code and we do want to capture these debug logs if something fails during this phase, but we do not want any logs once the guest process has started executed.

EtiennePerot commented 1 week ago

I see your point. That would be a different functionality than what --debug-command promises. Perhaps the runsc boot subcommand should be renamed to runsc sentry to make it better reflect what it actually means, but in turn this would mean that --debug-command=boot would stop working, and that --debug-command=sentry would do what --debug-command=boot does today. Having a different flag that means mean "give me the sentry logs but only up to the point where the user application starts" would be additional logic and functionality. A valid feature request, sure, but probably a different option than --debug-command.

jseba commented 1 week ago

That makes sense to me, having the name reflect the behavior would have removed the expectation of behavior from my perspective for sure. On my end, I've addressed the issue by disconnecting the debug pipe from our log buffer and discarding the buffer once we've established that givsor has successfully booted, so it's not a high priority ticket, especially if it's not as simple as "change the logging level at a specific point in the sandbox lifetime. It probably makes more sense as a --debug-subsystem or something along those lines to get any more granular.

google / gvisor