Open mturilli opened 4 years ago
Experiment 2 needs a new table. The cpu counts will be round up by node counts. For example, 42/84/126/168/210/252 cores; 1/2/3/4/5/6 nodes are possible to change resource sizes. To maximize scheduling, tasks are bumped up to 100 from 40 tasks. The possible table is like:
Run ID | #T_1000s | #T_100s | #T_10s | #G(T_1000s) | #G(T_100s) | #G(T_10s) | #Cores | TTX ideal | RU |
---|---|---|---|---|---|---|---|---|---|
1 | 100 | 100 | 100 | 1 | 1 | 2 | 252 | 1000s | ? |
2 | 100 | 100 | 100 | 1 | 1 | 10 | 210 | 1000s | ? |
3 | 100 | 100 | 100 | 1 | 2 | 50 | 168 | 1000s | ? |
4 | 100 | 100 | 100 | 1 | 4 | 100 | 126 | 1000s | ? |
I was hoping we could use published results to discuss the rate of launch (which is important for the O(10) seconds, if not O(100) seconds tasks) -- to convince the reader, that although not super efficient, our rate-of-launch overhead is adequate for the scales proposed. Suggestions of which published plots, if any, we can leverage ? If not, we should consider doing some (relatively quick) experiments to capture relevant performance of task launching.
I think the latest paper we published with ORNL contains that information, albeit indirectly. We can derive the scheduling rate from the experiments that @lee212 just run but that would not give us the scheduler performance upper boundary.
Proposed table for Experiment 3
system | T_7200s (mdrun) | T_3900s (CVAE) | T_840s (TICA) | T_600s (Inference) | T_5s (RLDock) |
---|---|---|---|---|---|
ntl9 | 12 | 10 | 10 | 1 | 1 |
ntl9 | 24 | 10 | 10 | 1 | 1 |
ntl9 | 48 | 10 | 10 | 1 | 1 |
ntl9 | 96 | 10 | 10 | 1 | 1 |
---- | --- | --- | --- | -- | - |
ntl9 | 60 | 50 | 50 | 5 | 1 |
ntl9 | 120 | 50 | 50 | 5 | 1 |
ntl9 | 240 | 50 | 50 | 5 | 1 |
ntl9 | 480 | 50 | 50 | 5 | 1 |
---- | --- | --- | --- | -- | - |
ntl9 | 120 | 100 | 100 | 10 | 1 |
ntl9 | 240 | 100 | 100 | 10 | 1 |
ntl9 | 480 | 100 | 100 | 10 | 1 |
ntl9 | 960 | 100 | 100 | 10 | 1 |
@lee212 are the entries temporal durations, or number of concurrent tasks ?
Exp 4
Run ID | #G_MD | #G_CVAE | #G_TICA | #G__Inference | #G_RLDock | # Cores | TTX Ideal |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 | 252 | 7200 |
2 | 1 | 1 | 2 | 3 | 1 | 210 | 7200 |
3 | 1 | 1 | 10 | 10 | 1 | 168 | 7200 |
4 | 1 | 2 | 10 | 10 | 1 | 126 |
Hi @lee212, what is the total number of cores you used for Exp 3? In Exp 1, we showed that, given a fixed amount of cores with which we can run all the available tasks, resource utilization increases only when the number of long-running tasks dominates over that of short-running tasks. Now that I am writing the paper, I see this might not need another experiment as Exp 1 is convincing in itself.
Looking at your table Experiment 3
, I had initially thought it described the scalability experiment we decided to do, say Exp 6
for lack of a better name. Was this the case or was I wrong? If the latter, do we have a table describing the scalability experiment?
About Exp 4, why did you choose 12:10:10:10:1 for the number of tasks? Is this what Arvind wants to use?
Table for Exp3 needs changes, it does not have correct numbers like exp1. I also agree it is not necessary as Exp1 satisfies its objective.
Exp4, the number of tasks was preserved from the real workload, ntl9 physical system which starts with 12 MD simulations, 10 CVAE and 10 TICA training, 1 inference and 1 reinforcement learning. Right, I didn't believe that exp4 needs to use the same distribution of exp2 like 100:100:100
but it shows resource utilization as a function of # of cores reaching ideal TTX like exp2.
New scoping experiments for ICPP paper
Inverted plot for resource utilization for reduced # of cores
This looks... unexpected...
Experiments
Here some ideas and notes on the experiments we may want to design, setup and run for the HPDC paper. Happy to discuss each experiment further if you find this interesting/useful.
Experiment 1
Done, results at https://github.com/radical-experiments/hyperspace_experiments/blob/master/analysis/nonuniform_tasks/nonuniform_tasks.ipynb
Experiment 2
On the base of the Experiment 1, showing better resource utilization by keeping TTX constant while reducing the total amount of resoruces required and controlling the number of execution generation per task duration.
Design
Setup
Legenda
Run: Number of experimt run
T_1000s: Number of tasks with 1000s duration
T_100s: Number of tasks with 100s duration
T_10s: Number of tasks with 10s duration
G(T_1000s): Number of generations for executing tasks with 1000s duration
G(T_100s): Number of generations for executing tasks with 100s duration
G(T_10s): Number of generations for executing tasks with 10s duration
Cores: Number of cores used to execute all the given tasks
Notes:
Experiment 3
Shows that the results observed in Experiment 1 apply to real-life workflows with tasks that have an actual distribution of execution time. Thus shows the insight we can get about resource utilization for a an actual workflow. As experiment 1 but with distribution of task execution time measured by executing one of the scientific workflows of the paper (choose the most interesting one from a scientific point of view).
Experiment 4
Shows we can maximize resource utilization while keeping the workflow execution time as close as feasible to its ideal total execution time. As Experiment 2 but with the same distribution of task execution time as in Experiment 3, and only with the maximal resource utilization, i.e., the run with minimal number of cores.
Experiment 5
Applies what learned with the previous experiments to an actual workflow, maximizing its resource utilization while minimizing its execution time for a given resource. Analyze the execution of the scientific workflow used for Experiment 3 and define all the unique ratios between heterogeneous tasks. For example, imagine that across the execution of al the workflow, we have 4 distinct ratios of 3 types of tasks. We would have 4 cases of Experiment 1. We would then apply the equation derived for Experiment 2 and we would calculate the optimal resource utilization as done in Experiment 4.