pulumi / ci-mgmt

Configuration for all things CI
Apache License 2.0
10 stars 4 forks source link

No space left on device blocks Provider upgrades #554

Closed t0yv0 closed 1 year ago

t0yv0 commented 1 year ago

Some recent changes in Go SDK generation pushed the builds over the limit of disk space.

build_sdk (go)](https://github.com/pulumi/pulumi-azure/actions/runs/6116569557/job/16603257286)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.308.0/_diag/Worker_20230908-024953-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.308.0/_diag/Worker_20230908-024953-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.308.0/_diag/Worker_20230908-024953-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)

I've not investigated deeply but this can be also related to Go build dependency caching. If changes in dependencies invalidate the cache but we don't track cache key accurately, it can download the previous cache, and then download new packages again anyway during go build, this can psh things over the line. Workaround to try here is tinkering with cache keys to force a miss.

Or it can be excessively chatty logging we might need to compress here.

iwahbe commented 1 year ago

https://github.com/pulumi/pulumi-aws/pull/2792/commits/0301de42af65062534284d290c766b4910dd50d8 fixed the build error. This is a bandaid more then a cure, but it will get us going again.

t0yv0 commented 1 year ago

Yeah this is what @aq17 and I landed on trying based on @thomas11 idea earlier in the day. More power.

t0yv0 commented 1 year ago

Unfortunately K8S repo now fails in test(go) target with the same OOD. This is in on the way to my P1s fixes so I'd like to take this and chase it down a bit deeper.

t0yv0 commented 1 year ago

We've leaned heavily into scheduling workloads on the pulumi-ubuntu-8core runner. This solution seems ok for now but may cause problems if the custom runner is out of capacity. In that case recommendation is to use the GitHub runners with more disk space. I was not able to full root cause for lack of time but I was able to measure K8s disk draw in the Go test job:

free: 231G
$ install stuff
free: 226G (-5G)
$ go test -run COMPILEONLY
free: 218G (-8G)
$ go test
free: 213G (-5G)

This runs out of 14G available on stock runners.

There are multiple reasons Go is very resource hungry here but I don't have exact data. 1. Azure SDK and 2 AWS SDKs and GCP SDKs are pulled into the compilation unit via program test pulumi/pkg spurious dependencies on Pulumi state backends; 2. when tests are run, more disk space is used by ProgramTest creating project copies; there might be some cleanup opportunity that's being missed.

t0yv0 commented 1 year ago

I'll close for now as I'm not sure it affects us anymore atm with the workarounds in place.

pulumi-bot commented 1 year ago

Cannot close issue:

Please fix these problems and try again.

iwahbe commented 1 year ago

It's worth noting that pulumi-aws fails when building the complete go SDK, before it reaches go integration tests.

t0yv0 commented 1 year ago

Yes most providers just fail to build but K8S also fails to test. In both cases the compilation burden of either SDK or the tests with all the transitive dependencies is what I think sinks the runner.