Closed Mema5 closed 2 years ago
Mmh I confirm that I can see the same thing that you described by reading Batsim logs.
Potential issues:
Can you try to write a small SimGrid code to detect if the error comes from SimGrid or Batsim? You can for example use mwe.cpp from SimGrid's issue 37 as a base.
Hi @mpoquet and thanks for your answer. My platform file and workload file are pretty straightforward, I pasted them at the end of this post.
I will try to identify if the problem comes from Simgrid or Batsim and follow up.
Workload file:
{
"description": "Test binpacking.",
"nb_res": 12,
"jobs": [
{"id": "0", "profile": "blast_vm_large", "res": 4, "subtime": 0},
{"id": "1", "profile": "blast_vm_large", "res": 4, "subtime": 5000}
],
"profiles": {
"blast_vm_large": {"com": 0.0, "cpu": 1.657216e14, "type": "parallel_homogeneous_total"}
}
}
Platform file:
<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4.1">
<config id="General">
<prop id="contexts/stack-size" value="16"></prop>
<prop id="contexts/guard-size" value="0"></prop>
</config>
<zone id="toy_g5k" routing="Full">
<host id="master_host" speed="100Mf">
<prop id="wattage_per_state" value="100:100:200" />
<prop id="wattage_off" value="10" />
</host>
<host id="taurus_0" core="12" pstate="0" speed="11.77Gf, 1e-9Mf, 0.166666666666667f, 0.006666666666667f">
<prop id="wattage_per_state" value="100:100:217, 9.75:9.75:9.75, 100:100:100, 125:125:125" />
<prop id="wattage_off" value="10" />
<prop id="sleep_pstates" value="1:2:3" />
</host>
<host id="taurus_1" core="12" pstate="0" speed="11.77Gf, 1e-9Mf, 0.166666666666667f, 0.006666666666667f">
<prop id="wattage_per_state" value="100:100:217, 9.75:9.75:9.75, 100:100:100, 125:125:125" />
<prop id="wattage_off" value="10" />
<prop id="sleep_pstates" value="1:2:3" />
</host>
</zone>
</platform>
Well intuited @mpoquet, it comes from simgrid.
I submitted the issue on SimGrid's repo : issue #95.
It seems the SimGrid issue was fixed, can we close this one too?
I also think that SimGrid#95 fixed this issue, but the following should be done before marking this issue as closed.
NB: A new SimGrid version should be released very soon.
I can confirm that this bug is fixed with batsim 4.1.0 and simgrid 3.31.0, which is the latest Batsim release on NUR-Kapack.
Hello, Today I bumped into an issue with my experiments, I don't know yet if it's a bug from my side, Batsim or Simgrid.
Bug description
I have a two identical jobs with 4 parallel tasks each, submitted at t=0 and t=5000.
Case 1: I schedule job0 and job1 on the multicore machine0, staying idle between executions
Case 2: after job0 is finished I switch off machine0, and switch it back on at t=5000 to run job1
Expected behavior: job0 and job1 should have the same execution time. In fact, here, job1 takes exactly 4 times longer than job0, which corresponds to its number of parallel executors. As if machine0 was not running multicore anymore after rebooting...
Versions
Logs
batsim.log with verbosity debug for this experiment.