Closed fintelia closed 6 years ago
Good idea! In this taxonomy, a worker thread processes exactly one replica at a time, before returning to the pool, correct?
I think the conventional solution for WorkerIndex
-like structures is to use UUIDs/GUIDs (to be able to tell apart workers returning from transient failures and newly started ones).
replica: single running instantiation of the nodes in a domain.
I assume this should say "in one shard of a domain"?
worker: component manages data plane operations
"A process that executes one or more replicas"?
I kind of want to call worker thread something else, because the simplification "thread" has so many meanings. Also, I can't use mod thread
and also use std::thread
:( "Processor"? "Executor"? " Other names?
"Processor" is a term already widely used in the data-flow literature for this purpose (albeit typically statically bound to a vertex). It doesn't seem too wild to say "a processor from the processor pool executes pending work at a replica".
Yeah.. The downside of using "processor" is that CPUs are processors. The alternative @fintelia and I just discussed was instead using "worker" here, and then using a different word for what we now call "worker". "Host" maybe?
"Host" seems suboptimal since you might run >1 Soup daemon/process per host; "process" or "instance" is perhaps more fitting. Or we just go back to the Soup theme and call the daemon process a "bowl" ;-)
Probably none of this matters very much, terms can change ;-)
I like "instance" since it doesn't have an existing well defined meaning and isn't tied to the Soup theme (and thus unsuitable for use in papers, etc)
So each instance has a worker pool of workers that execute the replicas assigned to that instance? And a replica is one copy (well, replica) of one shard of a domain?
a6a14cf4a04812986d40043065440923fc005f81 uses the word "Souplet" for instance, just until we settle on this. Instance still feels too generic. The commit only somewhat commits (heh) to this, in the sense that it doesn't also use that name to refer to the WorkerPool
that is spawned during local operation, but I think Souplet applies equally there.
As we move to distributed fault tolerant Soup, the terms we're using have become increasingly ambiguous and overloaded. I don't love all these names, but we should have terminology for all of the following concepts:
And for these higher level components...
We should also probably have distinct
DomainIndex
,ReplicaSetIndex
, andReplicaIndex
types as well asWorkerIndex
s to identify individual processes. Depending on how they're implemented, worker indexes may have to be randomly generated to avoid collisions.