rust-lang / crater

Run experiments across parts of the Rust ecosystem!
https://crater.rust-lang.org
643 stars 90 forks source link

runner going down with scheduled work stalls experiment #577

Closed Mark-Simulacrum closed 2 years ago

Mark-Simulacrum commented 3 years ago

If a runner goes down while having some number of crates to run, we will not reschedule those onto another runner, which indefinitely stalls the experiment.

This can be used to remove crates from a worker:

update experiment_crates set status = 'queued', assigned_to = null where status = 'running' and assigned_to = 'agent:gcp-1';

This gives us how many crates each worker is holding:

select assigned_to, count(*) from experiment_crates where status = 'running' group by assigned_to;
Mark-Simulacrum commented 2 years ago

This is fixed with the new time-based expiration.