Open xiechengsheng opened 7 years ago
hi Xiecheng Sorry for some miss-understanding, PyDockerMonitor is just my first version of this project, in which I implemented it as a standalone service and communicate with Yarn through RPC as you think. Later I found this solution was frustrating and hard to debug. Then I just quit this option.
My final implementation is pure in Yarn and only use Java, you can check my commit logs to see which part of codes in YARN I have changed. Actually, the most of the codes I implemented are in NodeManager and Capacity Scheduler.
And also, if you just started working on Hadoop ecosystem, the newly released hadoop-3.0 has a whole new implementation for Docker management and is not compatible with Hadoop-2.7 version.
Wei Chen
Hi, @yncxcw sorry to disturb you again. I have another question about the application scene of big-c system. As many papers mentioned, the long jobs will consume most cluster's resources and if the short jobs are scheduled after the long jobs, the head-of-line blocking will happen. But I wonder whether the head-of-line blocking problem exists in the real production cluster. As the Alibaba company's public cluster data wiki told us, the utilization rate of resources in the real production cluster can always be lower than 50%, so in my opinion, above the half of servers in the real production cluster have enough idle resources to execute the short jobs, and we won't see the head-of-line blocking problem in the real production cluster. What's your opinion about the head-of-line blocking problem in the real production cluster?
Hi, xiecheng
That's fine. I am happy to discuss research questions.
For head-of-line blocking, it really exists in some production systems, since small jobs are often blocked by long jobs. But it is more about scheduling strategy, like how to partition jobs into different categories, how to isolate resources between long jobs and short jobs(like configuring different queue quota)
The second situation where cluster utilization is low is another problem. It occurs because the system needs to guarantee a very strict SLA for some high priority workloads(like Hbase or Solr), in which case there should be plenty of resources reserved for request load fluctuation. The reserved part of resources is the result of wasted 50% of cluster resources.
1 is a special case of 2, like to ensure SLA of short jobs and avoid head-of-line blocking, the cluster needs to reserve part of resources in case the burst of short jobs will not be blocked by long jobs.
@yncxcw Thanks again for your detailed explanation.
Thanks in advance:smile:~
hi, xiecheng
That's OK.
For 1, yes, the motivation of these projects is to minimize the queueing delays, either queued at master node, like yarn or queuing at slave node, like sparrow.
For2. yes, our design purpose is to have a mechanism to implement "preemption without killing", like the traditional OS. One thing should be noticed here is we preempt before the jobs are scheduled to target nodes since the resource manager has the full picture of cluster utilization and can make the optimal decisions.
Wei
@yncxcw That's great. Thanks for your patient explanation. So the big-c system doesn't need extra special servers to execute short jobs, and the final goal of preemptive scheduling is to make full use of cluster's resources and save the costing. Your explanation cleared up my misunderstandings about this system.
That's OK~
Hi, Chen Wei, I have read your ATC paper and look through this repository. And I have some questions about the project's structure.
Thanks in advance:smile:~