pulumi / actions

Deploy continuously to your cloud of choice, using your favorite language, Pulumi, and GitHub!
Apache License 2.0
253 stars 72 forks source link

Sporadic "Command failed with exit code 255" error #861

Open analogrelay opened 1 year ago

analogrelay commented 1 year ago

What happened?

We have a GitHub Workflow that runs a preview and up. Sometimes this workflow succeeds, but often it fails with an unknown error. The error is sporadic, and usually goes away on retry (or sometimes several retries). It also doesn't reproduce at all when running locally.

I've redacted our org/project name and the preview GUID but I'm happy to share if needed.

 code: -2
 stdout: 
 stderr: Command failed with exit code 255: pulumi preview --parallel 2147483647 --exec-agent pulumi/actions@v3 --color auto --exec-kind auto.local --event-log /tmp/automation-logs-preview-Cnn9eo/eventlog.txt --stack [our-org]/production --non-interactive
Previewing update (aseriousbiz/production)

View Live: https://app.pulumi.com/aseriousbiz/[our-project]/production/previews/[guid]

@ Previewing update....
    pulumi:pulumi:Stack [our-project]-production running 
@ Previewing update.........................................
    pulumi:pulumi:Stack [our-project]-production running error: an unhandled error occurred: Program exited with non-zero exit code: -1
    pulumi:pulumi:Stack [our-project]-production  1 error

Diagnostics:
  pulumi:pulumi:Stack ([our-project]-production):
    error: an unhandled error occurred: Program exited with non-zero exit code: -1

 err?: Error: Command failed with exit code 255: pulumi preview --parallel 2147483647 --exec-agent pulumi/actions@v3 --color auto --exec-kind auto.local --event-log /tmp/automation-logs-preview-Cnn9eo/eventlog.txt --stack [our-org]/production --non-interactive
Previewing update (aseriousbiz/production)

View Live: https://app.pulumi.com/aseriousbiz/[our-project]/production/previews/[guid]

@ Previewing update....
    pulumi:pulumi:Stack [our-project]-production running 
@ Previewing update.........................................
    pulumi:pulumi:Stack [our-project]-production running error: an unhandled error occurred: Program exited with non-zero exit code: -1
    pulumi:pulumi:Stack [our-project]-production  1 error

Diagnostics:
  pulumi:pulumi:Stack ([our-project]-production):
    error: an unhandled error occurred: Program exited with non-zero exit code: -1

Expected Behavior

The deployment should succeed, or at least provide meaningful context as to why the error occurred. I see #589 is tracking a way to add increased verbosity, which would be very helpful here.

Steps to reproduce

I don't have good repro steps since it's heavily dependent upon our private project.

Output of pulumi about

This is the output from my local machine, though as I said it's never reproduced there:

CLI
Version      3.53.1
Go Version   go1.19.5
Go Compiler  gc

Plugins
NAME    VERSION
nodejs  unknown

Host
OS       darwin
Version  13.1
Arch     arm64

This project is written in nodejs: executable='/Users/anurse/.nodenv/shims/node' version='v16.9.0'

Current Stack: aseriousbiz/abbot-core/canary

TYPE                                                                      URN
<redacted>

Found no pending operations associated with aseriousbiz/canary

Backend
Name           pulumi.com
URL            https://app.pulumi.com/serious-anurse
User           serious-anurse
Organizations  serious-anurse, aseriousbiz

Dependencies:
NAME  VERSION
      0.0.0
      0.0.0
      0.0.0

Pulumi locates its logs in /var/folders/6f/pbj0nvr972sdlddd7w7pygt00000gn/T/ by default

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

analogrelay commented 1 year ago

Further info: I believe this may be because a child-process is being OOM-killed. I do notice heavy memory usage when running pulumi preview on this project on my local machine sometimes. I suspect it involves something that is normally cached on my local machine since most pulumi preview runs don't use that much RAM. Of course, on GitHub Actions, it's a fresh machine each time (we don't use any caching), so it would always use that extra memory.

dixler commented 1 year ago

Hi. Thanks for posting this. I understand that your project is private, but it would be greatly appreciated if you or someone facing this issue could provide a repro to help us get to the bottom of this. :pray:

UnstoppableMango commented 1 year ago

Hi! I think I'm encountering the same issue. Here is a workflow run where this occurred, and here is a tag on the commit that had a failure. My pulumi program is located in /infra. (Please ignore my terrible code)

For me this occurs nearly every run so I suspect it's more likely an issue with something I'm doing, but it looks very similar to the error reported above.

UnstoppableMango commented 1 year ago

My issue appeared to be related to the nodejs version somehow. I downgraded to LTS 18.16.0 and the error went away. Sorry for bothering this thread!