hongzimao / decima-sim

Learning Scheduling Algorithms for Data Processing Clusters
https://web.mit.edu/decima/
286 stars 90 forks source link

some questions of your code #30

Open chenjw259 opened 3 years ago

chenjw259 commented 3 years ago

Hello! I would like to ask what are the versions of the various libraries you use for this code?

Thank you!

hongzimao commented 3 years ago

If I'm not mistaken, we developed the project with tensorflow version 1.14.0 (#5). I think you can use the latest version for other libraries like numpy and matplotlib.

chenjw259 commented 3 years ago

thank you!

chenjw259 commented 3 years ago

hello, In your code, you aggregate the node embedding into graph summary, in this part, did you input all the DAGs into the model or just a single DAG?

jahidhasanlinix commented 2 years ago

@chenjw259 Based on your last question, did you have an answer about it> can you provide some details if you are able to understand it.

hongzimao commented 2 years ago

Re DAGs input to the model: we pass in all DAGs as input.

jahidhasanlinix commented 2 years ago

I'm kind of confused of one part of Decima. You mentioned " We evaluate Decima in Simulation and in a real Spark Cluster".

  1. So when I was trying to run your code in spark and it doesn't work for me. By any chance could you explain how I can use Decima code in that Simulation software, can you name it or provide link to use it.

  2. And also can you please give some instructions how can I integrate your code to Spark Cluster. I really can't figure it out. I will appreciate your reply and instructions to help.

hongzimao commented 2 years ago

This repo is only the simulation part of the project. We did the training purely on simulation. The real Spark integration is through a customized extension. We also modified the scheduling module of Spark to request scheduling decision from our customized extension (in python) via protobuf. This part of the code was a bit intricate and we didn't have enough time to properly refactor for public use. Will give an update if we find time to go back and open source the code. Thanks!

jahidhasanlinix commented 2 years ago

Thank you so much for your response. If you don't mind, can you please give me some instructions like which part of the Spark Scheduling model code you modify it (as Spark code written in Scala and I tried it did not work it gives me an error when I did Spark-submit) and

How to customize that extension to integrate with Spark using Decima code? Is it like You changed the Spark codebase or just modified some part of that Scheduling module to let your model execute in that customized extension.

jahidhasanlinix commented 2 years ago

I was reading some stuff about Protobuf: https://developers.google.com/protocol-buffers ; But I am kind of confused with the modified scheduler in spark and then request scheduling decision from Your Customized extension via protobuf. How does this process actually can be done as Decima code base is huge, it will be nice if you share some details about this integration process.