puppetlabs / bolt

Bolt is an open source orchestration tool that automates the manual work it takes to maintain your infrastructure on an as-needed basis or as part of a greater orchestration workflow. It can be installed on your local workstation and connects directly to remote nodes with SSH or WinRM, so you are not required to install any agent software.
https://puppet.com/docs/bolt/latest/bolt.html
Apache License 2.0
496 stars 224 forks source link

Streaming stdout and stderr #102

Closed philtech-oh closed 3 years ago

philtech-oh commented 6 years ago

When running a script or command remotely, Bolt should stream the output (stdout and stderr) back in near real time. For a script that takes a long time to complete, it is frustrating to wait for script outpuy.

Iristyle commented 6 years ago

Thanks - we have already discussed this a bit internally. This may have to be limited to SSH transport only to start. WinRM streaming can be a bit tricky given the nature of the SOAP based protocol.

genebean commented 6 years ago

FWIW, fabric does this really well. It prefixes all the output with the name of the host it ran on and a timestamp. It also returns the command that was run as part of the output (I think) which makes keeping track of where you are in the process much easier.

All in all, this combination of feedback makes it easy to track whats going on and, when output is being run in parallel, the name prefixing lets you tell one from the other. In the event that you need to see a single system's output after the fact for debugging you can easily filter the text from your console by hostname.

MikaelSmith commented 6 years ago

If you're using SSH or WinRM, this may already be visible via --debug. Try bolt command run 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' -n some_ssh_nodes --debug or bolt command run 'for ($i=10; $i -gt 1; $i--) { sleep 1; echo $i }' -n some_winrm_nodes --debug.

krorvikrl commented 6 years ago

I'm +1 on this. It is useful for pipelines that generate a bit of output, for instance.

By the way, we've seen cases where bolt seems to buffer output for a while, then simply returning no output, and exiting nonzero failing the relevant pipeline. Even if the job is successful on the remote node. Is this due to overflowing some buffer?

In either case, this is an interesting feature for us :)

genebean commented 6 years ago

@MikaelSmith You are correct that with --debug that bolt will stream output but the usefulness is lacking in my opinion. The difference between bolt's --debug and normal output of fabric is that with fabric you know where the output is coming from.

I generated the output below using each tool and the sample command you listed above. In the fabric one you may notice that each host is grouped together... that is coincidence as I have seen the hosts' output all mixed together many times which is why the prefixing of each piece of output is very important to me.

Bolt Output

$ bolt command run 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' -n some-server-prod-6.example.com,some-server3-prod.example.com,some-server5-prod.example.com --debug

Did not find config for some-server-prod-6.example.com in inventory
Did not find config for some-server3-prod.example.com in inventory
Did not find config for some-server5-prod.example.com in inventory
Started with 100 max thread(s)
Starting: command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on some-server-prod-6.example.com, some-server3-prod.example.com, some-server5-prod.example.com
Authentication method 'gssapi-with-mic' is not available
Running command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on ["some-server-prod-6.example.com"]
Running command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on some-server-prod-6.example.com
Running command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on ["some-server5-prod.example.com"]
Running command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on ["some-server3-prod.example.com"]
Running command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on some-server5-prod.example.com
Running command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' on some-server3-prod.example.com
Opened session
Executing: for ((i=10; i>=1; i--)) do sleep 1; echo $i; done
stdout: 10

Opened session
Executing: for ((i=10; i>=1; i--)) do sleep 1; echo $i; done
Opened session
Executing: for ((i=10; i>=1; i--)) do sleep 1; echo $i; done
stdout: 9

stdout: 10

stdout: 10

stdout: 8

stdout: 9

stdout: 9

stdout: 7

stdout: 8

stdout: 8

stdout: 6

stdout: 7

stdout: 7

stdout: 5

stdout: 6

stdout: 6

stdout: 4

stdout: 5

stdout: 5

stdout: 3

stdout: 4

stdout: 4

stdout: 2

stdout: 3

stdout: 3

stdout: 1

Command returned successfully
Closed session
{"node":"some-server-prod-6.example.com","status":"success","result":{"stdout":"10\n9\n8\n7\n6\n5\n4\n3\n2\n1\n","stderr":"","exit_code":0}}
stdout: 2

stdout: 2

stdout: 1

Command returned successfully
Closed session
{"node":"some-server3-prod.example.com","status":"success","result":{"stdout":"10\n9\n8\n7\n6\n5\n4\n3\n2\n1\n","stderr":"","exit_code":0}}
stdout: 1

Command returned successfully
Closed session
{"node":"some-server5-prod.example.com","status":"success","result":{"stdout":"10\n9\n8\n7\n6\n5\n4\n3\n2\n1\n","stderr":"","exit_code":0}}
Finished: command 'for ((i=10; i>=1; i--)) do sleep 1; echo $i; done' with 0 failures in 14.25 sec
Started on some-server-prod-6.example.com...
Started on some-server5-prod.example.com...
Started on some-server3-prod.example.com...
Finished on some-server-prod-6.example.com:
  STDOUT:
    10
    9
    8
    7
    6
    5
    4
    3
    2
    1
Finished on some-server3-prod.example.com:
  STDOUT:
    10
    9
    8
    7
    6
    5
    4
    3
    2
    1
Finished on some-server5-prod.example.com:
  STDOUT:
    10
    9
    8
    7
    6
    5
    4
    3
    2
    1
Successful on 3 nodes: some-server-prod-6.example.com,some-server3-prod.example.com,some-server5-prod.example.com
Ran on 3 nodes in 14.25 seconds

Fabric Output

$ fab -H some-server-prod-6.example.com,some-server3-prod.example.com,some-server5-prod.example.com output_test

[some-server-prod-6.example.com] Executing task 'output_test'
[some-server3-prod.example.com] Executing task 'output_test'
[some-server5-prod.example.com] Executing task 'output_test'
[some-server3-prod.example.com] run: for ((i=10; i>=1; i--)) do sleep 1; echo $i; done
[some-server3-prod.example.com] out: 10
[some-server3-prod.example.com] out: 9
[some-server3-prod.example.com] out: 8
[some-server3-prod.example.com] out: 7
[some-server3-prod.example.com] out: 6
[some-server3-prod.example.com] out: 5
[some-server3-prod.example.com] out: 4
[some-server3-prod.example.com] out: 3
[some-server3-prod.example.com] out: 2
[some-server3-prod.example.com] out: 1
[some-server3-prod.example.com] out: 

[some-server5-prod.example.com] run: for ((i=10; i>=1; i--)) do sleep 1; echo $i; done
[some-server5-prod.example.com] out: 10
[some-server5-prod.example.com] out: 9
[some-server5-prod.example.com] out: 8
[some-server5-prod.example.com] out: 7
[some-server5-prod.example.com] out: 6
[some-server5-prod.example.com] out: 5
[some-server5-prod.example.com] out: 4
[some-server5-prod.example.com] out: 3
[some-server5-prod.example.com] out: 2
[some-server5-prod.example.com] out: 1
[some-server5-prod.example.com] out: 

[some-server-prod-6.example.com] run: for ((i=10; i>=1; i--)) do sleep 1; echo $i; done
[some-server-prod-6.example.com] out: 10
[some-server-prod-6.example.com] out: 9
[some-server-prod-6.example.com] out: 8
[some-server-prod-6.example.com] out: 7
[some-server-prod-6.example.com] out: 6
[some-server-prod-6.example.com] out: 5
[some-server-prod-6.example.com] out: 4
[some-server-prod-6.example.com] out: 3
[some-server-prod-6.example.com] out: 2
[some-server-prod-6.example.com] out: 1
[some-server-prod-6.example.com] out: 

Done.
MikaelSmith commented 6 years ago

I agree we can definitely improve the formatting. I think it should also be something you can enable separate from --debug.

scoopex commented 6 years ago

Probably this is related to: https://github.com/puppetlabs/bolt/issues/429

Parallel execution with direct streaming of output would be great.

genebean commented 6 years ago

Just wanted to see if there was any update on this @MikaelSmith

MikaelSmith commented 6 years ago

@genebean put up a PR with some ideas, mind trying it out?

jadestorm commented 5 years ago

Just to throw in another potential use case for this -- I am using bolt with rundeck to run jobs (primarily on windows nodes right now, rundeck's built in ssh works just fine for non-windows) and it is often helpful to see live output in the rundeck logs while a long job is running.

genebean commented 5 years ago

@MikaelSmith The stream option in https://github.com/puppetlabs/bolt/pull/625 looks really nice! The one question I have is this: could it optionally not repeat all the output when the command finishes? i.e. if streaming don't show the same info via the normal output? In my mind I see this as being a supplemental flag to the one for streaming.

Henning-B commented 5 years ago

Any news on this? This would be really helpful

genebean commented 4 years ago

With the change to CD4PE that has bolt in use there too, this would be even more valuable to me now.

lcrownover commented 4 years ago

Would love to get an update on this. For running tasks, it's generally not needed, but if you heavily use the "command run" feature, streaming output is a must!

voiprodrigo commented 4 years ago

Really need this too to use in run_script calls in plans. Thanks!

tskirvin commented 3 years ago

+1 for this feature, it would make certain tasks much easier (like, say, rolling reboots across a cluster of interactive nodes).

lucywyman commented 3 years ago

@tskirvin What do you mean by "interactive nodes"? Besides I think the prompt inventory plugin Bolt won't allow you to write to stdin while running anything, as managing that interaction (looking for a prompt on stdout, getting input from the user for all of their targets, writing, etc.) is both complicated and antithetical to distributed remote execution in general.

tskirvin commented 3 years ago

The machines themselves are interactive login nodes, so I need to give the users warning before I reboot the machines out from under them. Therefore, there's a lot of 'sleep' commands in the scripts. It's good to have the bolt run tell me "this machine is about to start its reboot" instead of only reporting in after the whole cluster is rebooted.

scoopex commented 3 years ago

@tskirvin What do you mean by "interactive nodes"? I suppose that he means something like "vim", a dist-upgrade script (where you want to monitor progress and not to wait for the entire result), "top" or "journalctl -f"

lucywyman commented 3 years ago

Hmm, it seems like even if you were able to watch output for all the targets it wouldn't give you very much time to notify users if it logged it was going to do so right before actually doing it. I think it'd also be pretty easy to achieve seeing what Bolt is about to do by [writing a plan]() and using the [out::message]() function to say what the script that is about to run will do. That will print while it's running (not at the end), and you can watch the output and notify your users. Something like this (psuedocode):

plan myplan(
  TargetSpec $targets
) {
  run_script("configure_some_things", $targets)
  run_script("do_some_other_tasks", $targets)
  $tnames = get_targets($targets).map |$t| { $t.name }
  out::message("Starting reboot on ${join($tnames, ", ")}")
  # We also have a sleep function that can give you some time if you need
  ctrl::sleep(30)
  run_script("reboot_machines", $targets)
}

Would something like that work?

fionera commented 3 years ago

Is there any new information regarding this? I currently have a Manifest that takes about 17minutes to apply and its very annoying to wait for the completion to see the log. It also takes a lot of memory (4G) when running on the Node

bobhenkel commented 3 years ago

Ran into this issue today. I have setup a bolt module for our team that has standardized our docker build process(build, tagging, testing, and pushing). The goal is we can build locally or on Jenkins exactly the same way. If I or Jenkins runs bolt task run docker::build -t local I don't see any of the output until the entire docker build process finishes. It would be really nice for the feedback loop to see this information. Not all but some of our tasks are literary tasks that a user may use locally and target their local workstation as well as remote system, and just getting a spinner with this text is not ideal Started on localhost... |. It's hard to tell on something that can run for 10 minutes or more if it's hung up or still processing.

The plan approach @lucywyman showed is definatlly an option to improve no live streaming from script output, and I suppose I can echo inside my bash task that docker build is starting as a way to improve on not having live streaming. I guess in my use case I would like to tell bolt I'm running this task locally and please just live stream the output to me because it makes sense in this context. I can see how not having live streaming support for hundreds or thousands of nodes makes sense though.

Thanks Bob

bobhenkel commented 3 years ago

For now to get what I want, I just streamed my output to a file from within my task code using docker build blahblah . > taskxyz.log then I can live tail it with tail -f taskxyz.log to see what's going on in real time. Not ideal, but not the worst work around I have done in my life. Not sure how well this will work inside the context of Jenkins, but locally is livable.

bobhenkel commented 3 years ago

The issue of not getting to see stdout until after something has completed was causing me to itch and putting a damper my bolt use case. Did some pro level googling , and thanks to this https://serverfault.com/a/1023944/73399 I was able to apply it to bolt and get exactly what I wanted with what I feel isn't a bad work around.

I ended up with a bash script in my repo that's the entry point to running puppet bolt commands that I can run locally or on my CI system(Jenkins in my case). I'm sure the log file can be update to be a parameter that you could pass a path in or something or you could generate a file name that is an UUID so that concurrent proceccses don't have file conflicts, but the idea is there. Hope it helps someone.

run_bolt.sh

#!/usr/bin/env bash
set -m
if [ -f "build.log" ] ; then
    rm "build.log"
fi
touch build.log && tail -f build.log &
bolt plan run docker::build --transport local -t localhost
kill %1 # ensure background tail job is stopped.

build.sh (shell script that is the task implementation that plan docker::build calls). It writes stdout and stderr to the file being tailed .

#!/usr/bin/env bash
docker build -t bob_image . 2>&1 >> build.log
lucywyman commented 3 years ago

Still 'experimental', but I'm going to close this for now and open new issues if there are any issues or improvements with the existing --stream flag. https://puppet.com/docs/bolt/latest/experimental_features.html#streaming-output For those that haven't checked Bolt out in a while!