Open lmatz opened 7 months ago
If we are able to manually schedule in such a way right now, we can do experiments, especially with a real workload, to verify if the improvement is decent enough.
A simple example could be comparing the performance of 6 actors on a single machine with that of 2 actors on 3 machines, by specifying --parallelism
on start-up. And the result should be pretty predictable that single node outperforms as there's no remote shuffling involved and the cache locality is better.
However, I believe this only holds when there're sufficient resources for a single machine to run 6 actors. What if the business needs a significant scale, say, requiring 600 or 6000 cores? In such cases we have no choice but to schedule the job in a distributed manner.
And the result should be pretty predictable that single node outperforms as there's no remote shuffling
This is a good point, we have one more reason to make the actors of a job as colocated as possible.
What if the business needs a significant scale, say, requiring 600 or 6000 cores? In such cases we have no choice but to schedule the job in a distributed manner.
We keep the existing option, i.e. schedule all the actors of a job as evenly as possible. Maybe the title of issue gives a false impression And we also give users the choice to schedule a job to a specific subset of compute nodes.
Per my experience, if a job requires 600 or 6000 cores, that job is likely to be taken good care of and occupy a dedicated cluster. This is a special case in the sense:
Besides, among existing use cases, (1) the parallelism of a single job is << the sum of the parallelism of all the jobs, which gives us room to rearrange the actors so that:
There does exist a handful of jobs with the maximum parallelism(the total number of CPUs in the cluster). But due to (1), they, in reality, occupy a percentage of the total resources, i.e. maximum parallelism * total CPUs / the sum of the parallelism of all the jobs
.
By properly scheduling these maximum parallelism jobs to a subset of compute nodes (which gives some less parallelism on paper), they may in fact (1) acquire more resources (as fewer jobs share the resources with them) (2) get less interference by the shuffle overhead and the storage overhead.
I think it has the potential to be a good cloud-only feature, and the selling point is to:
streaming_parallelism
for each jobThis issue has been open for 60 days with no activity.
If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity
label.
You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄
Correct me if I am wrong, but I believe right now, the meta node tries to schedule the actors/parallel units(not sure which is the correct terminology) of a streaming job to all the CNs as evenly as possible.
As explained in https://www.notion.so/risingwave-labs/RFC-Implement-Shared-Nothing-Local-Compaction-7f58ebd9022345e69eb308b906090b3f?pvs=4:
I think we can deduce from these two claims that if we schedule all the actors of a streaming job to the same CN, then the inefficiencies do not exist anymore. Because for this job, it is essentially running in a 1-CN cluster. (Not necessarily 1, but the fewer CNs, the better)
In real-life workloads, we often observe that users have a large number of streaming jobs and a large percentage of them do NOT want to occupy all the parallel units, i.e.
streaming_parallelism
<< the maximum it can use. In such cases, we have the opportunity (but still be able to fully utilize all the resources because we have a large number of streaming jobs) to schedule each of them to a single CN to process, which increases efficiency.If we are able to manually schedule in such a way right now, we can do experiments, especially with a real workload, to verify if the improvement is decent enough.
Welcome comments!