oar-team / batsim

Batsim: Infrastructure simulator for job and I/O scheduling
GNU Lesser General Public License v3.0
30 stars 15 forks source link

Reject all jobs causes batsim to deadlock #34

Closed adfaure closed 7 years ago

adfaure commented 7 years ago

Hello, I don't know if it matters but if you reject all job batsim will deadlock with:

[master_host0:server:(2) 604796.002400] /home/adfaure/Projects/batsim/src/server.cpp:700: [root/CRITICAL] Left simulation loop, but the simulation does NOT seem finished...
Backtrace (displayed in process server):             
---> xbt_backtrace_display_current at ??:?, 0x7f2631d97fd2                                                
---> server_process(int, char**) at /home/adfaure/Projects/batsim/src/server.cpp:88, 0x49c5b2             
---> std::_Function_handler<void (), simgrid::xbt::MainFunction<int (*)(int, char**)> >::_M_invoke(std::_Any_data const&) at ??:?, 0x7f2631d40449                                                                   
---> simgrid::kernel::context::RawContext::wrapper(void*) at ??:?, 0x7f2631d2e512                         
Aborted       

Hence the scheduler won't finish and will wait forever.

You can test is using the my scheduler on any workload.

cargo run --bin --rej #Will reject all jobs, one by one until the end.
mpoquet commented 7 years ago

Can reproduce the issue with 7eab359 by running cargo run --bin rej. Investigating.

mpoquet commented 7 years ago

Batsim forgot to check whether the simulation was finished on job rejection. Fixed in 611e7bf, thanks for the report :).

adfaure commented 7 years ago

That was fast ! Thanks