layer5io / meshery-smp-action

GitHub Action for pipelining microservices and Kubernetes performance testing with Meshery
https://layer5.io/projects/nighthawk
Apache License 2.0
29 stars 22 forks source link

Mark test servers for auto-deletion using "end_at" parameter #75

Closed leecalcote closed 1 year ago

leecalcote commented 1 year ago

Current Behavior

Of the scheduled tests that run multiple times a day, they have faced a few challenges. Notably, one of those challenges is in the cleanup phase once a test is complete. Currently, it is frequently the case that any number of bare metal servers that are used for testing or orphaned, and not decommissioned at the end of each test. This leaves an inordinate amount of bare metal servers, unnecessarily unavailable for used by other projects.

@vielmetti has been most helpful in identifying ways to mitigate this from happening.

Desired Behavior

All resources provisioned for a scheduled test are subsequently decommissioned at the end of that same test.

Implementation

Recently @vielmetti point this out:

You can create servers that will auto-delete themselves at a time certain, perfect for test runs. See https://deploy.equinix.com/developers/docs/metal/deploy/spot-market/#spot-market-request-creation. You want the “end_at” parameter on the API endpoint for device creation

Slack Message

Acceptance Tests

  1. Test servers are decommissioned at end of scheduled testing period.

Contributor Guide

vielmetti commented 1 year ago

looks like it's here that needs to be changed, and @gyohuangxin had the last edit. (actually looks like a pretty simple fix). We'll need to compute a timestamp.

.github/workflows/scripts/start-cil-runner.sh

What I don't know is, how long are the tests expected to run, at worst case? Can the machines be clobbered in 24, 6, 2 hours?

gyohuangxin commented 1 year ago

@vielmetti Currently, it runs in 30mins, at worst case. But considering that we may add more test cases in the future, I think 2 hours is a appropriate deadline.

vielmetti commented 1 year ago

@gyohuangxin The Equinix system bills by the hour, so my recommendation is to set the expiry at say 1 hour 50 minutes from the time it's created, that will catch anything relatively fast but not risk running 2:01 and incurring the extra charge.

The example end_at time is in ISO format, e.g

"end_at": "2020-09-24T05:00:00Z",

which can be generated with something like date -v +110M -Iminutes .

gyohuangxin commented 1 year ago

@vielmetti I create a PR to fix this: https://github.com/layer5io/meshery-smp-action/pull/76/files, but I want to confirm the default time zone used by the Equinix system: https://github.com/layer5io/meshery-smp-action/pull/76/files#diff-0e9a22dd7a8bbc870d2ab77c5392deb500a1f946d0e6ff615d75792a2cd8e977R20

gyohuangxin commented 1 year ago

@vielmetti I confirmed that Equinix system is using UTC/GMT timezone, so please review the PR, thanks. image

vielmetti commented 1 year ago

Thanks @gyohuangxin - I am checking on the semantics of "termination_time", to see if this is expected to work for ordinary instances or only for "spot instances". Will follow up soonest when I can confirm.

vielmetti commented 1 year ago

Confirming two things: one, that our team is working on updated docs for the termination_time option, and two, that based on my understanding this should work as planned.

vielmetti commented 1 year ago

To complete this -

We have updated the Equinox Metal "termination_time" docs at https://deploy.equinix.com/developers/api/metal/#tag/Devices/operation/createDevice to reflect better the use case (ephemeral instances) and to document the time zone question described above.

Since this change was deployed last month we have not had any of the previous issues described! That's all good news.