pulumi / pulumi-dotnet

.NET support for Pulumi
Apache License 2.0
27 stars 25 forks source link

Pulumi Inline Automation API process exits immediately when hitting ctrl+c #124

Open klyse opened 1 year ago

klyse commented 1 year ago

What happened?

I have a C# pulumi project that creates cloud infrastructure. I'm using the Pulumi InlineProgram automation API to do that.

When hitting CTRL+C the pulumi automation process exits immediately instead of waiting for the provided cancellation token to be canceled. This means Pulumi exits without saving the state correctly and this creates issues when retrying in a later moment. Example issue:

Pulumi up output on VirtualMachineId xxx Message: " Note that pulumi refresh will need to be run interactively to clear pending CREATE operations."

Whenever the app receives SIGTERM my state is broken and I need to interact manually. That's okay when I use the pulumi CLI but not when I use the Automation API.

Expected Behavior

Pulumi child process stays alive until the CancellationToken is cancelled.

Steps to reproduce

Example of the up function:


var token = new CancellationTokenSource().Token;

await stack.UpAsync(new UpOptions
{
    OnStandardError = error => Logger.Warning("Pulumi up error on VirtualMachineId {VirtualMachineId} Error: {Error}", vm.Id, error),
    OnStandardOutput = msg => Logger.Debug("Pulumi up output on VirtualMachineId {VirtualMachineId} Message: {Message}", vm.Id, msg)
}, token);

Then hit CTRL+C during the up command

Output of pulumi about

CLI Version 3.60.0 Go Version go1.20.2 Go Compiler gc

Host OS darwin Version 13.2.1 Arch arm64

Backend Name pulumi.com URL xxx User xxx Organizations xxx, xxx

Pulumi locates its logs in xxx by default warning: Failed to read project: no Pulumi.yaml project file found (searching upwards from xxx). If you have not created a project yet, use pulumi new to do so: no project file found warning: Failed to get information about the current stack: no Pulumi.yaml project file found (searching upwards from /Users/klaus). If you have not created a project yet, use pulumi new to do so: no project file found

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

justinvp commented 1 year ago

Thanks for opening the issue @klyse. Definitely worth considering, but it's unclear to me if UpAsync (and similar methods) should automatically handle CTRL+C or whether your program should handle it. In the meantime, you can workaround by handling it yourself, for example: https://stackoverflow.com/a/13899429

klyse commented 1 year ago

Thanks for the prompt response! I think if you provide an option to forward a cancellation token, the pulumi child process should not get affected by CTRL+C. This creates unexpected dynamics like the one mentioned by me above.

I tried your solution already:

using Application.CloudRendering.CloudProviders;
using Pulumi.Automation;
using Serilog;

Console.WriteLine("Hello, World!");

Log.Logger = new LoggerConfiguration()
    .WriteTo.Console()
    .CreateLogger();

var exitEvent = new ManualResetEvent(false);

Console.CancelKeyPress += (sender, eventArgs) =>
{
    eventArgs.Cancel = true;
    exitEvent.Set();
};

var x = AzureStack.VmStack();

var stackArgs = new InlineProgramArgs("d", "asdf1", x);

var stack = await LocalWorkspace.CreateOrSelectStackAsync(stackArgs);

var cts = new CancellationTokenSource();
var ct = cts.Token;
try
{
    try
    {
        await stack.CancelAsync(ct);
    }
    catch (Exception e)
    {
        Log.Warning("cannot cancel");
    }

    await stack.RefreshAsync(cancellationToken: ct);
    Log.Information("up");
    await stack.UpAsync(new UpOptions
    {
        Debug = true,
        OnStandardError = error => Log.Warning("Pulumi up: {Error}", error),
        OnStandardOutput = msg => Log.Information("Pulumi up: {Message}", msg)
    }, cancellationToken: ct);
    Log.Information("finish");
}
catch (Exception e)
{
    Log.Error(e, "Error");
}

exitEvent.WaitOne();

Log.Information("bye");

Log.CloseAndFlush();

Unfortunately it seems the process still exits. Here are the logs:

...
+  azure-native:compute:VirtualMachine vm creating (0s) error: grpc: the client connection is closing
 +  azure-native:compute:VirtualMachine vm **creating failed** error: grpc: the client connection is closing
 +  pulumi:pulumi:Stack d-asdf1 creating (9s) error: update failed
 +  pulumi:pulumi:Stack d-asdf1 **creating failed (0.60s)** 1 error; 18 debugs

...

  azure-native:compute:VirtualMachine (vm):
    error: grpc: the client connection is closing

...

stderr: 

   at Pulumi.Automation.Commands.LocalPulumiCmd.RunAsyncInner(IList`1 args, String workingDir, IDictionary`2 additionalEnv, Action`1 onStandardOutput, Action`1 onStandardError, EventLogFile eventLogFile, CancellationToken cancellationToken)
   at Pulumi.Automation.Commands.LocalPulumiCmd.RunAsync(IList`1 args, String workingDir, IDictionary`2 additionalEnv, Action`1 onStandardOutput, Action`1 onStandardError, Action`1 onEngineEvent, CancellationToken cancellationToken)
   at Pulumi.Automation.Workspace.RunStackCommandAsync(String stackName, IList`1 args, Action`1 onStandardOutput, Action`1 onStandardError, Action`1 onEngineEvent, CancellationToken cancellationToken)
   at Pulumi.Automation.WorkspaceStack.RunCommandAsync(IList`1 args, Action`1 onStandardOutput, Action`1 onStandardError, Action`1 onEngineEvent, CancellationToken cancellationToken)
   at Pulumi.Automation.WorkspaceStack.UpAsync(UpOptions options, CancellationToken cancellationToken)
   at Pulumi.Automation.WorkspaceStack.UpAsync(UpOptions options, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in /Users/xxx/ConsoleApp1/Program.cs:line 45
...
klyse commented 1 year ago

Hi @justinvp, I was wondering if you where able to reproduce this?

justinvp commented 1 year ago

Some notes as I looked into this:

Automation API currently shells out to the CLI under the covers. On macOS/Linux, when CTRL+C is pressed, a SIGINT signal is sent to the entire process group; the parent process (automation API program) and its children (Pulumi CLI process).

What we likely need to do is run the child CLI process in its own process group, so that when the parent gets SIGINT from CTRL+C the children don't also get it, and allow canceling the cancellation token to send SIGINT to the children. Unfortunately, it doesn't look like .NET has a built-in way to support this via Process.Start (see https://github.com/dotnet/runtime/issues/44944), although, we don't call Process.Start directly, we use https://github.com/Tyrrrz/CliWrap. It looks like some improved support for sending SIGINT to child processes when a cancellation token is canceled was recently added: https://github.com/Tyrrrz/CliWrap/issues/47. So that handles sending to children. But it still doesn't address the problem of child processes receiving SIGINT. We likely need to do something custom when launching processes so that they don't get the parent's SIGINT when CTRL+C is pressed. This is how PowerShell worked around it on Unix: https://github.com/PowerShell/PowerShell/pull/3901 (and some more info in the issue: https://github.com/PowerShell/PowerShell/issues/2321).

klyse commented 1 year ago

Thanks for the explanation @justinvp! I understand the problem much better now and fear, there is no "easy" workaround until a fix is available? Maybe we can avoid that the pulumi state gets corrupted and needs manual input somehow on process exit? (Talking about this message: Note that pulumi refresh will need to be run interactively to clear pending CREATE operations)

klyse commented 1 year ago

Update: since my app is running in a docker container I changed the docker termination signal to SIGHUP this does not influence pulumi. Then in my C# app I'm listening to SIGHUP to initiate the graceful stop:

var lifetime = hostBuilder.Services.GetRequiredService<IHostApplicationLifetime>();
using var posixSignalRegistration = PosixSignalRegistration.Create(StopSignal, signal =>
{
    signal.Cancel = true;
    Log.Warning("Received posix {Signal}, stopping application", signal.Signal);
    lifetime.StopApplication();
});

It's not perfect, but a workaround ;)