oar-team / batsim

Batsim: Infrastructure simulator for job and I/O scheduling
GNU Lesser General Public License v3.0
30 stars 15 forks source link

Simultaneous events sent in different batsim messages #67

Closed Mema5 closed 2 years ago

Mema5 commented 2 years ago

Description of the problem

I experience a case in my simulations where two simultaneous events are sent in different batsim events. See the log bellow where the following happens simultaneously (timestamp 226640.000000):

[master_host0:server:(2) 226640.000000] [server/INFO] Job w0!2222 SUBMITTED. 2223 jobs submitted so far
[master_host0:server:(2) 226640.000000] [server/INFO] Job w0!2223 SUBMITTED. 2224 jobs submitted so far
[master_host0:server:(2) 226640.000000] [server/INFO] Job w0!2224 SUBMITTED. 2225 jobs submitted so far
[master_host0:Scheduler REQ-REP:(7155) 226640.000000] [network/INFO] Sending '{"now":226640.000000,"events":[{"timestamp":226640.000000,"type":"JOB_SUBMITTED","data":{"job_id":"w0!2222","job":{"id":"w0!2222","profile":"700","res":1,"subtime":226640.000000,"user_id":43,"walltime":6000.000000}}},{"timestamp":226640.000000,"type":"JOB_SUBMITTED","data":{"job_id":"w0!2223","job":{"id":"w0!2223","profile":"950","res":1,"subtime":226640.000000,"user_id":43,"walltime":6000.000000}}},{"timestamp":226640.000000,"type":"JOB_SUBMITTED","data":{"job_id":"w0!2224","job":{"id":"w0!2224","profile":"910","res":1,"subtime":226640.000000,"user_id":43,"walltime":6000.000000}}}]}'
[master_host0:Scheduler REQ-REP:(7155) 226640.000000] [network/INFO] Received '{"now":226640.0,"events":[{"timestamp":226640.0,"type":"EXECUTE_JOB","data":{"job_id":"w0!2222","alloc":"34","mapping":{"0":"0"}}},{"timestamp":226640.0,"type":"EXECUTE_JOB","data":{"job_id":"w0!2223","alloc":"34","mapping":{"0":"0"}}},{"timestamp":226640.0,"type":"SET_RESOURCE_STATE","data":{"resources":"36","state":"0"}}]}'
[minos_41:switch OFF 36:(7158) 226640.000000] [pstate/INFO] Switching machine 36 ('minos_41') ON. Passing in virtual pstate 3 to do so
[minos_41:switch OFF 36:(7158) 226640.000000] [pstate/INFO] Computing 1 flop to simulate time & energy cost of switch ON
[minos_37:job_w0!2129:(6987) 226640.000000] [jobs_execution/INFO] Job 'w0!2129' finished in time (success)
[master_host0:server:(2) 226640.000000] [server/INFO] Job w0!2129 has COMPLETED. 2040 jobs completed so far
[master_host0:Scheduler REQ-REP:(7159) 226640.000000] [network/INFO] Sending '{"now":226640.000000,"events":[{"timestamp":226640.000000,"type":"JOB_COMPLETED","data":{"job_id":"w0!2129","job_state":"COMPLETED_SUCCESSFULLY","return_code":0,"alloc":"31"}}]}'
[master_host0:Scheduler REQ-REP:(7159) 226640.000000] [network/INFO] Received '{"now":226640.0,"events":[]}'

According to @mpoquet this is normal but could be changed:

If the decision component is ready (AKA if all its previous decisions have been injected, as seen on the figure attached) the Batsim main actor (named server IIRC) will directly forward events to the decision component. If two separate events arrive at the server at the same timestamp, they will most likely be sent in separate Batsim messages for this reason. One possible improvement is to make Batsim's main actor only forward events to the decision component when the main actor's input mailbox is empty. If this is important for you, I'll accept a merge request with this new feature as long as this behavior optional (typically, a command-line option) :wink:.

I would like to change this behavior and send all simultaneous events at once in order to achieve consistency with another version of my scheduler.

MWE

I tried to isolate this problem in a MWE but did not manage. I used energy_platform_homogeneous_no_net.xml, batsched scheduler filler and a custom workload:

{
    "nb_res": 1,
    "jobs": [
        {"id":1, "subtime":0, "walltime": 100, "res": 1, "profile": "hg_10"},
        {"id":2, "subtime":10, "walltime": 100, "res": 1, "profile": "hg_10"}
    ],

    "profiles": {
        "hg_10": {"type": "parallel_homogeneous", "cpu": 1e9,"com": 0}
    }
}

In the simulation, the messages are sent grouped, as can be seen in the batsim log extract below:

[host0:job_w0!1:(5) 10.000000] [jobs_execution/INFO] Job 'w0!1' finished in time (success)
[master_host:server:(2) 10.000000] [server/INFO] Job w0!2 SUBMITTED. 2 jobs submitted so far
[master_host:server:(2) 10.000000] [server/INFO] Job w0!1 has COMPLETED. 1 jobs completed so far
[master_host:Scheduler REQ-REP:(6) 10.000000] [network/INFO] Sending '{"now":10.000000,"events":[{"timestamp":10.000000,"type":"JOB_SUBMITTED","data":{"job_id":"w0!2","job":{"id":"w0!2","subtime":10,"walltime":100,"res":1,"profile":"hg_10"}}},{"timestamp":10.000000,"type":"JOB_COMPLETED","data":{"job_id":"w0!1","job_state":"COMPLETED_SUCCESSFULLY","return_code":0,"alloc":"0"}},{"timestamp":10.000000,"type":"NOTIFY","data":{"type":"no_more_static_job_to_submit"}}]}'
[master_host:Scheduler REQ-REP:(6) 10.000000] [network/INFO] Received '{"now":10.0,"events":[{"timestamp":10.0,"type":"EXECUTE_JOB","data":{"job_id":"w0!2","alloc":"0","mapping":{"0":"0"}}}]}'

Do you know why this difference between my execution and the MWE?

mpoquet commented 2 years ago

I could not reproduce the issue. I also tried to execute several jobs from the same scheduler message to see if the job completion events are sent in the same message. Yes, the two JOB_COMPLETED were sent in the same message. I used the cluster512.xml platform on the following workload.

{
    "nb_res": 2,
    "jobs": [
        {"id":1, "subtime":0, "walltime": 100, "res": 1, "profile": "hg_10"},
        {"id":2, "subtime":0, "walltime": 100, "res": 1, "profile": "hg_10"},
        {"id":3, "subtime":10, "walltime": 100, "res": 1, "profile": "hg_10"}
    ],

    "profiles": {
        "hg_10": {"type": "parallel_homogeneous", "cpu": 1e9,"com": 0}
    }
}

batsim log

[master_host0:Scheduler REQ-REP:(3) 0.000000] [network/INFO] Received '{"now":0.0,"events":[]}'
[master_host0:server:(2) 0.001800] [server/INFO] Job w0!1 SUBMITTED. 1 jobs submitted so far
[master_host0:server:(2) 0.001800] [server/INFO] Job w0!2 SUBMITTED. 2 jobs submitted so far
[master_host0:Scheduler REQ-REP:(4) 0.001800] [network/INFO] Sending '{"now":0.001800,"events":[{"timestamp":0.001800,"type":"JOB_SUBMITTED","data":{"job_id":"w0!1","job":{"id":"w0!1","subtime":0,"walltime":100,"res":1,"profile":"hg_10"}}},{"timestamp":0.001800,"type":"JOB_SUBMITTED","data":{"job_id":"w0!2","job":{"id":"w0!2","subtime":0,"walltime":100,"res":1,"profile":"hg_10"}}}]}'
[master_host0:Scheduler REQ-REP:(4) 0.001800] [network/INFO] Received '{"now":0.0018,"events":[{"timestamp":0.0018,"type":"EXECUTE_JOB","data":{"job_id":"w0!1","alloc":"0","mapping":{"0":"0"}}},{"timestamp":0.0018,"type":"EXECUTE_JOB","data":{"job_id":"w0!2","alloc":"1","mapping":{"0":"0"}}}]}'
[a0:job_w0!1:(5) 1.002400] [jobs_execution/INFO] Job 'w0!1' finished in time (success)
[a1:job_w0!2:(6) 1.003000] [jobs_execution/INFO] Job 'w0!2' finished in time (success)
[master_host0:server:(2) 1.004000] [server/INFO] Job w0!1 has COMPLETED. 1 jobs completed so far
[master_host0:server:(2) 1.005600] [server/INFO] Job w0!2 has COMPLETED. 2 jobs completed so far
[master_host0:Scheduler REQ-REP:(7) 1.005600] [network/INFO] Sending '{"now":1.005600,"events":[{"timestamp":1.004000,"type":"JOB_COMPLETED","data":{"job_id":"w0!1","job_state":"COMPLETED_SUCCESSFULLY","return_code":0,"alloc":"0"}},{"timestamp":1.005600,"type":"JOB_COMPLETED","data":{"job_id":"w0!2","job_state":"COMPLETED_SUCCESSFULLY","return_code":0,"alloc":"1"}}]}'
[master_host0:Scheduler REQ-REP:(7) 1.005600] [network/INFO] Received '{"now":1.0056,"events":[]}'
[master_host0:server:(2) 10.001800] [server/INFO] Job w0!3 SUBMITTED. 3 jobs submitted so far
[master_host0:Scheduler REQ-REP:(8) 10.001800] [network/INFO] Sending '{"now":10.001800,"events":[{"timestamp":10.001800,"type":"JOB_SUBMITTED","data":{"job_id":"w0!3","job":{"id":"w0!3","subtime":10,"walltime":100,"res":1,"profile":"hg_10"}}}]}'
[master_host0:Scheduler REQ-REP:(8) 10.001800] [network/INFO] Received '{"now":10.0018,"events":[{"timestamp":10.0018,"type":"EXECUTE_JOB","data":{"job_id":"w0!3","alloc":"0","mapping":{"0":"0"}}}]}'

Furthermore it seems that Batsim already makes sure that the server's mailbox is empty before calling the decision process (code here). I do not think that we can do better than this from Batsim so I close this issue.

I think that this is what happens when you execute your big workload: