Closed praiskup closed 11 months ago
@siteshwar @kdudka what do you think about this?
I think this should work. The tricky part from user's point of view will be to implement RELEASE_hook
in a reliable race-free way, as we discussed yesterday.
Indeed. I'm not familiar with the OSH Worker's logic enough to make some useful guidance, but the whole point is to implement a script that just attempts to stop (the OSH worker daemon, so it doesn't take new jobs), or fails. Answering "am I actually doing something" shouldn't be a dilemma for the OSH Worker.
@siteshwar I think the RELEASE_hook
could work like this:
max_load
on the specific worker (identified by hostname) to 0 in order to make sure that it does not pick tasks any more@siteshwar I think the
RELEASE_hook
could work like this:
- set
max_load
on the specific worker (identified by hostname) to 0 in order to make sure that it does not pick tasks any more
Can this be set as soon as the worker reports a running task to hub? So that we could ensure it never gets another task.
I am not sure to be honest. Long time ago I needed to set max_load
to 2 in order to make tasks with sub-tasks actually work in OSH. But it could have been due to a bug of kobo that has been fixed since then. You can give it a try and, if it works reliably for VersionDiffBuild
and ErrataDiffBuild
tasks with 1 or 2 sub-tasks, I am fine with that.
I have comments only for the implementation details
Configuration could look like /etc/resalloc-agents.yaml
I think the configuration could be part of pools.yaml
because even though agents behave differently, there is still a pool of them. Having this in pools.yaml
should also give us some features (e.g. multiple pools of agents) for free, and not re-invent the same config options that we already parse.
Distinguishing a pool of agents from our current workers could be done by either
agents: true
resource_type: agent
cmd_take
is definedAlso, it may be a good idea to have this as a part of our standard "loop". It should allow us to have a combination of cmd_new
to provision a new agent, cmd_take
to use and re-use it, cmd_try_release
(we may use existing cmd_release
) to release it for another ticket, and cmd_delete
.
All of this may be obvious, and you already have it figured out. But just in case ...
I think the configuration could be part of pools.yaml
I thought this could be a separate config, and actually even a separate package/daemon. Because this would be rather a "client" helper thing (not necessarily running on the same host). I'm also a bit afraid of a logical mixup of "tickets" with "resources" (resource is started by ticket which is taken by another resource == agent). Hmm, worth considering anyway, thank you for bringing this up.
Just for the record, a dummy proof-of-concept (which could evolve into a real patch, if considered useful) is in #125.
The PR #125 merged and released. Closing.
There are use cases like in OpenScanHub/Kobo, where workers/resources are self-standing privileged "agents" that decide what to do about themselves (compete with other agents WRT taking jobs, have privileges to modify a shared database, etc.). This is a different use case than Copr has, where the workers/resources are just non-privileged dummy VMs controlled from the outside via SSH.
In such Agent-like use cases, it's typically possible to guess the ideal number of workers we should have allocated (by introspecting queue, currently running tasks, etc.). This number should correspond to the number of tickets taken from the Resalloc system.
To help with maintaining such "agent-like" resources, we could abstract this problem into an "AgentSpawner" daemon doing this loop:
Configuration could look like
/etc/resalloc-agents.yaml
: