tawfiqul-islam / RM_DeepRL

Resource Management with DeepRL using TF Agents
12 stars 5 forks source link

How does the baselines code work? #3

Open chc199949 opened 2 years ago

chc199949 commented 2 years ago

Hello, I have a few questions after reading your paper and code. I want to ask you.

The first question is: with regard to action 0, it is necessary to consider waiting when the batch job ends and the queue is not empty. Does the baseline algorithm need to consider action 0? For example, does FF consider actions 1-9 or 0-9 each time? Will action 0 be polled when RR polls?

The second question is: about the cost calculation in the algorithm, the memory size and quantity shown in Table 4 in the paper are not consistent with the settings in cluster. Will the unit price of cost calculation shown in Figure 6 in the paper be 0.24, 0.48 and 0.72 respectively?

The third question is: due to the complex semantics of batch job, queue, action0 and so on, can the constraints of ILP you use be made public?

Looking forward to your reply, thank you very much.

tawfiqul-islam commented 2 years ago

Thanks for your questions!

  1. Baseline algorithms consider 0 only when there aren't enough resources in the cluster to schedule any new job. However, for DRL-based algorithms, waiting can be a result of two things: (a) the scheduling agent is trying to avoid violating the resource capacity constraint of the cluster (similar to what baselines do), and (b) the scheduling agent decides that even if there are enough resources to place the new job, waiting for a while might result in better optimization objective in the long run.

  2. Yes, the memory size is inconsistent with the paper table. You are right about the unit price of cost calculation. Although the settings show 1, 2, and 3 respectively, these were converted later on with the actual price shown on the paper.

  3. You can see the implementations of the baseline algorithms from my other open-source projects: https://github.com/tawfiqul-islam/RM-Simulator The following class implements the ILP policy: https://github.com/tawfiqul-islam/RM-Simulator/blob/master/simulator/src/Policy/ILPScheduler.java

Hope it helps.

chc199949 commented 2 years ago

Thank you for your reply After reviewing the Java code, I understand the semantics of action 0 and the algorithm logic of the baseline. But there are still three questions to ask :

  1. The data read by CSVReader is jobs_drl_burst.csv and jobs_drl.csv, which are not found in the directory. Can you provide them?

  2. If using workload_ jobshigh load.csv or workload jobs low_ load.csv in the directory, it can be found that parsedstr[7] needs to be read to set the type of job when reading in CSVReader, but there is no such column in these two CSVs.

  3. The setting of dataset shown in this Python projects is different from the Java code. In order to directly run it to get the result for baselines as your paper showed, can you provide the dataset used in the paper which matches the version in the Java setting ? Or how to generate java setting from datasets in Python? For example, how to generate T est、T D from the CSVs in the Python projects?