Closed djmitche closed 6 years ago
(I'm putting this down here as a good student/intern project)
Used this RFC to create an Outreachy proposal.
This will require:
- Modifying the queue to accept this sort of declaration
- Modifying workers to declare this information to the queue
- Writing documentation (based on the existing, partial documentation) for workers
- Modifying the tools site to display this information (perhaps moving the list of workers to a new subpage)
Would be super neat:
workerType
createTask
endpoint to also validate task payload based on the workerType
/provisionerId
oneOf
array where each entry explicitly includes the workerType
/provisionerId
propertiesIndeed, those things would be neat. Note that
Modify queue createTask endpoint to also validate task payload based on the workerType/provisionerId
has some chicken-and-egg issues: the payload schema isn't known until there's a worker, and for provisioned workers there's no worker until there's a task.
I think the general consensus was to just not try to solve that problem, and to accept that the queue will allow createTask with a payload that the worker may (my may not, e.g., after a new deployment!) later reject.
The queue is still not the right place for worker documentation.
In scriptworker, we have modified expected payload schema based on something in the task, generally scopes.
From @djmitche:
Note that
Modify queue createTask endpoint to also validate task payload based on the workerType/provisionerId
has some chicken-and-egg issues: the payload schema isn't known until there's a worker, and for provisioned workers there's no worker until there's a task.
From @jhford:
The queue is still not the right place for worker documentation.
Addressing both of these concerns, maybe the payload schema should be part of the worker type definition, that gets updated when the worker type gets updated. This way it is only sent once per worker type update, not once every time a worker starts up, for every worker of a given worker type.
@escapewindow I think the multiple payloads could be handled with an anyOf
schema element.
Regarding frequent changes, I don't think the schema would be required (and anyway the schema wouldn't be enforced by the queue; see my paragraph about chickens and eggs), so an actively-developed worker implementation could just never send a schema.
Addressing both of these concerns, maybe the payload schema should be part of the worker type definition, that gets updated when the worker type gets updated. This way it is only sent once per worker type update, not once every time a worker starts up, for every worker of a given worker type.
I agree. The worker type definition sounds like a great place for defining a worker type. The only place where that doesn't exist is non-ec2 instances right now. The proposal for a more general purpose provisioner would include support for adding these facilities for non-ec2 instances, even those instance which aren't provisioned with any api at all.
I haven't turned the proposal into an RFC yet, I'll get to that this week, all going to plan.
Would it be flexible, or would we define all supported schema permutations in one superset schema?
You could just use {"type": "object"}
then the only requirement is that task.payload
is an object :)
It would have no restrictions on properties, etc, in fact this is probably what the default schema would be if you don't specify one, thus, also solving petes chicken and egg problem.
The proposal for a more general purpose provisioner would include support for adding these facilities for non-ec2 instances, even those instance which aren't provisioned with any api at all.
So I think this is where we disagree. I don't want another service that all workers register with, and if so, it should be more like a worker-manager. IMO I think that ship sailed when @helfi92 put actions in the queue (it's plausible that it wasn't properly weighted at the time). I'm happy to jump and discuss this, but I think using the queue for this metadata is solid, and the path we've prepared.
Anyways, let's keep that discussion of this RFC. And resolve on vidyo, @jhford please set something up we got a lot closer to consensus last dustin, pete, bstack, you and I talked, let's take another round :)
Per discussion in Berlin, @jhford will be proposing a new arcihtecture for worker management, which will among other things have a "worker manager" service where this information will be stored. This RFC will depend on that one, so it's on hold until that is completed.
That worker manager is #124.
Each workerType can be made up of instances running a different kind of worker (different implementation, or different version or configuration of the same implementation). Rather than try to document all workers in the reference section and leave users to guess what implementation, version, and configuration are running on a particular workerType, we would like to "declare" this information to the queue and display it in the tools interface.
Result
The result would be that a workerType Page like aws-provisioner-v1/gecko-3-b-macosx64 would include documentation for the specific implementation, version, and configuration of that worker, as well as the payload schema for that worker. That documentation can link to the reference section for more in-depth descriptions of features, but should be a comprehensive documentation of the worker's behavior -- not just a description.
Implementation
This will require: