mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.99k stars 244 forks source link

Decide on terminology #44

Closed fintelia closed 6 years ago

fintelia commented 6 years ago

As we move to distributed fault tolerant Soup, the terms we're using have become increasingly ambiguous and overloaded. I don't love all these names, but we should have terminology for all of the following concepts:

And for these higher level components...

We should also probably have distinct DomainIndex, ReplicaSetIndex, and ReplicaIndex types as well as WorkerIndexs to identify individual processes. Depending on how they're implemented, worker indexes may have to be randomly generated to avoid collisions.

ms705 commented 6 years ago

Good idea! In this taxonomy, a worker thread processes exactly one replica at a time, before returning to the pool, correct?

I think the conventional solution for WorkerIndex-like structures is to use UUIDs/GUIDs (to be able to tell apart workers returning from transient failures and newly started ones).

jonhoo commented 6 years ago

replica: single running instantiation of the nodes in a domain.

I assume this should say "in one shard of a domain"?

worker: component manages data plane operations

"A process that executes one or more replicas"?

jonhoo commented 6 years ago

I kind of want to call worker thread something else, because the simplification "thread" has so many meanings. Also, I can't use mod thread and also use std::thread :( "Processor"? "Executor"? " Other names?

ms705 commented 6 years ago

"Processor" is a term already widely used in the data-flow literature for this purpose (albeit typically statically bound to a vertex). It doesn't seem too wild to say "a processor from the processor pool executes pending work at a replica".

jonhoo commented 6 years ago

Yeah.. The downside of using "processor" is that CPUs are processors. The alternative @fintelia and I just discussed was instead using "worker" here, and then using a different word for what we now call "worker". "Host" maybe?

ms705 commented 6 years ago

"Host" seems suboptimal since you might run >1 Soup daemon/process per host; "process" or "instance" is perhaps more fitting. Or we just go back to the Soup theme and call the daemon process a "bowl" ;-)

Probably none of this matters very much, terms can change ;-)

fintelia commented 6 years ago

I like "instance" since it doesn't have an existing well defined meaning and isn't tied to the Soup theme (and thus unsuitable for use in papers, etc)

jonhoo commented 6 years ago

So each instance has a worker pool of workers that execute the replicas assigned to that instance? And a replica is one copy (well, replica) of one shard of a domain?

jonhoo commented 6 years ago

a6a14cf4a04812986d40043065440923fc005f81 uses the word "Souplet" for instance, just until we settle on this. Instance still feels too generic. The commit only somewhat commits (heh) to this, in the sense that it doesn't also use that name to refer to the WorkerPool that is spawned during local operation, but I think Souplet applies equally there.