hongzimao / decima-sim

Learning Scheduling Algorithms for Data Processing Clusters
https://web.mit.edu/decima/
286 stars 90 forks source link

Average interarrival time #15

Closed larissayukimiz closed 4 years ago

larissayukimiz commented 4 years ago

Hi! Where can I get the interarrival time of the continuous arrivals? question

hongzimao commented 4 years ago

So for Poisson process (https://www.probabilitycourse.com/chapter11/11_1_2_basic_concepts_of_the_poisson_process.php), the interarrival time of a job is drawn from an exponential distribution. In our example, the mean of this distribution is 45 seconds.

If you asked about the code, the interarrival time is passed through a global parameter: https://github.com/hongzimao/decima-sim/blob/master/param.py#L34-L35. It's used in the job generation process here: https://github.com/hongzimao/decima-sim/blob/c010dd74ff4b7566bd0ac989c90a32cfbc630d84/spark_env/job_generator.py#L128-L129

Hope this helps!

larissayukimiz commented 4 years ago

I see. So it doesn't generate automatically when I execute the code?

hongzimao commented 4 years ago

Sorry I don't get your question --- what doesn't generate automatically? The job generator is called in the beginning to generate a set of jobs (initial jobs and streaming jobs) in a batch. It just generates the job size and assigns an interarrival time for each of them. The job generator is called here: https://github.com/hongzimao/decima-sim/blob/c010dd74ff4b7566bd0ac989c90a32cfbc630d84/spark_env/env.py#L371-L373

larissayukimiz commented 4 years ago

Sorry, Mao. I was talking about how you calculated the average interarrival time (where you got the values).

hongzimao commented 4 years ago

It's just a hyperparameter. You can try different values to generate different loads. We just use an interarrival average so that the system load is ~80%. You can calculate the average load using the job distribution (so that you know the average job size) and the average interarrival time (so that you know the total work in a window). The work arrived in a window / server capacity will be the load in percentage.

larissayukimiz commented 4 years ago

Ohhh I get it now. So it was on default down here. Sorry for the misunderstanding! And thank you for the explanation!