iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
290 stars 27 forks source link

Optimize cloud-init user data script #287

Open 0x2b3bfa0 opened 2 years ago

0x2b3bfa0 commented 2 years ago

On public cloud machines, the startup script needs to be compatible with any modern and sensible GNU+Linux image, and that requires some optimizations and workarounds. For example, we should try to avoid depending on package managers and software other than systemd.

Installing rclone

Whilst rclone binary distributions are self–contained Go executables, they come bundled in zip files, and extracting them in a portable way is hard.

Watching files

In order to avoid installing a file watcher,[^1] we've replaced “smart” data+log synchronization by polling, considerably increasing transfer costs.

Issues

Other fixes

[^1]: Installing software usually requires resorting to distribution–specific package managers.

dacbd commented 2 years ago

I haven't yet experimented enough so this might not be applicable, but are you open to having an alternative/selectable options for the log capture?

The ideas I have rattling around might be past the scope of what task is trying to provide.

0x2b3bfa0 commented 2 years ago

Supporting arbitrary log backends with the same interface doesn't look like a trivial task. Still, you can always install and run software like the CloudWatch Agent as part of the task script.