Closed Mema5 closed 2 years ago
I could not reproduce the issue. I also tried to execute several jobs from the same scheduler message to see if the job completion events are sent in the same message. Yes, the two JOB_COMPLETED were sent in the same message. I used the cluster512.xml
platform on the following workload.
{
"nb_res": 2,
"jobs": [
{"id":1, "subtime":0, "walltime": 100, "res": 1, "profile": "hg_10"},
{"id":2, "subtime":0, "walltime": 100, "res": 1, "profile": "hg_10"},
{"id":3, "subtime":10, "walltime": 100, "res": 1, "profile": "hg_10"}
],
"profiles": {
"hg_10": {"type": "parallel_homogeneous", "cpu": 1e9,"com": 0}
}
}
batsim log
[master_host0:Scheduler REQ-REP:(3) 0.000000] [network/INFO] Received '{"now":0.0,"events":[]}'
[master_host0:server:(2) 0.001800] [server/INFO] Job w0!1 SUBMITTED. 1 jobs submitted so far
[master_host0:server:(2) 0.001800] [server/INFO] Job w0!2 SUBMITTED. 2 jobs submitted so far
[master_host0:Scheduler REQ-REP:(4) 0.001800] [network/INFO] Sending '{"now":0.001800,"events":[{"timestamp":0.001800,"type":"JOB_SUBMITTED","data":{"job_id":"w0!1","job":{"id":"w0!1","subtime":0,"walltime":100,"res":1,"profile":"hg_10"}}},{"timestamp":0.001800,"type":"JOB_SUBMITTED","data":{"job_id":"w0!2","job":{"id":"w0!2","subtime":0,"walltime":100,"res":1,"profile":"hg_10"}}}]}'
[master_host0:Scheduler REQ-REP:(4) 0.001800] [network/INFO] Received '{"now":0.0018,"events":[{"timestamp":0.0018,"type":"EXECUTE_JOB","data":{"job_id":"w0!1","alloc":"0","mapping":{"0":"0"}}},{"timestamp":0.0018,"type":"EXECUTE_JOB","data":{"job_id":"w0!2","alloc":"1","mapping":{"0":"0"}}}]}'
[a0:job_w0!1:(5) 1.002400] [jobs_execution/INFO] Job 'w0!1' finished in time (success)
[a1:job_w0!2:(6) 1.003000] [jobs_execution/INFO] Job 'w0!2' finished in time (success)
[master_host0:server:(2) 1.004000] [server/INFO] Job w0!1 has COMPLETED. 1 jobs completed so far
[master_host0:server:(2) 1.005600] [server/INFO] Job w0!2 has COMPLETED. 2 jobs completed so far
[master_host0:Scheduler REQ-REP:(7) 1.005600] [network/INFO] Sending '{"now":1.005600,"events":[{"timestamp":1.004000,"type":"JOB_COMPLETED","data":{"job_id":"w0!1","job_state":"COMPLETED_SUCCESSFULLY","return_code":0,"alloc":"0"}},{"timestamp":1.005600,"type":"JOB_COMPLETED","data":{"job_id":"w0!2","job_state":"COMPLETED_SUCCESSFULLY","return_code":0,"alloc":"1"}}]}'
[master_host0:Scheduler REQ-REP:(7) 1.005600] [network/INFO] Received '{"now":1.0056,"events":[]}'
[master_host0:server:(2) 10.001800] [server/INFO] Job w0!3 SUBMITTED. 3 jobs submitted so far
[master_host0:Scheduler REQ-REP:(8) 10.001800] [network/INFO] Sending '{"now":10.001800,"events":[{"timestamp":10.001800,"type":"JOB_SUBMITTED","data":{"job_id":"w0!3","job":{"id":"w0!3","subtime":10,"walltime":100,"res":1,"profile":"hg_10"}}}]}'
[master_host0:Scheduler REQ-REP:(8) 10.001800] [network/INFO] Received '{"now":10.0018,"events":[{"timestamp":10.0018,"type":"EXECUTE_JOB","data":{"job_id":"w0!3","alloc":"0","mapping":{"0":"0"}}}]}'
Furthermore it seems that Batsim already makes sure that the server's mailbox is empty before calling the decision process (code here). I do not think that we can do better than this from Batsim so I close this issue.
I think that this is what happens when you execute your big workload:
job_w0!2129
actor next, to manage the end of the job (that will send a JOB_COMPLETED event to the server actor)job_w0!2129
has not been executed yet). As the mailbox is empty, Batsim calls the decision component, which looks fine to me.
Description of the problem
I experience a case in my simulations where two simultaneous events are sent in different batsim events. See the log bellow where the following happens simultaneously (timestamp 226640.000000):
According to @mpoquet this is normal but could be changed:
I would like to change this behavior and send all simultaneous events at once in order to achieve consistency with another version of my scheduler.
MWE
I tried to isolate this problem in a MWE but did not manage. I used energy_platform_homogeneous_no_net.xml, batsched scheduler
filler
and a custom workload:In the simulation, the messages are sent grouped, as can be seen in the batsim log extract below:
Do you know why this difference between my execution and the MWE?