tinkerbell / cluster-api-provider-tinkerbell

Cluster API Infrastructure Provider
Apache License 2.0
91 stars 35 forks source link

Implement retries for BMC interactions #335

Open chrisdoherty4 opened 7 months ago

chrisdoherty4 commented 7 months ago

BMCs are known to fail/act oddly. CAPT uses Rufio when BMC data is referenced by the Hardware resource to power machines off/on and configure netboot. The Rufio Tasks/Jobs indicate whether they failed or succeeded. For increased resiliancy we should consider implementing retries in CAPT for the Rufio interactions.