iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
290 stars 27 forks source link

`runner` tempdir patch #582

Closed dacbd closed 2 years ago

dacbd commented 2 years ago

I think the real fix is probably to run the GHA client in a separate slice enforcing resource limits

0x2b3bfa0 commented 2 years ago

Launching the GitHub Actions runner in a separate slice is the ersatz version of using containers. 🙃 Still, sounds like a good workaround to have a bit more control over runners without triggering the container detection logic from https://github.com/iterative/terraform-provider-iterative/issues/146#issuecomment-931308916 and https://github.com/actions/runner/issues/367#issuecomment-597742895.

dacbd commented 2 years ago

I confess to not completely replicating the original failure brought by @JacksonMaxfield to confirm, but

Silly question: is the default mktemp path being mounted as tmpfs on the images we use? 🤔

No, as I have seen it survive reboots, however, I believe there is a reaping process that hits the dir. Or there was an existing process already in memory that deleted enough files from disk when it got SIGTERM from rebooting my test.

dacbd commented 2 years ago

@0x2b3bfa0 looks like we should just do this: https://github.com/iterative/terraform-provider-iterative/issues/62 ?

0x2b3bfa0 commented 2 years ago

@0x2b3bfa0 looks like we should just do this: https://github.com/iterative/terraform-provider-iterative/issues/62?

Not until Docker releases the fix? 🤔

dacbd commented 2 years ago

sorry I just saw that the link PR has been merged so I assumed it's been released?

0x2b3bfa0 commented 2 years ago

As per the relevant milestone, it has not been released yet

0x2b3bfa0 commented 2 years ago

No, as I have seen it survive reboots, however, I believe there is a reaping process that hits the dir. Or there was an existing process already in memory that deleted enough files from disk when it got SIGTERM from rebooting my test.

I guess I don't have enough context. 😅 Looks clear as to me.

0x2b3bfa0 commented 2 years ago

If /tmp is not mounted as tmpfs, what does this pull request fix? 🤔