oar-team / batsim

Batsim: Infrastructure simulator for job and I/O scheduling
GNU Lesser General Public License v3.0
30 stars 15 forks source link

Force waiter processes to finish #50

Closed lccasagrande closed 5 years ago

lccasagrande commented 5 years ago

Describe what the pull request does
This pull request forces Batsim to wait for all waiter process to finish before ending the simulation. This ensures that processes that rely on "CALL_ME_LATER" do not hang up.

Checklist
Branch name.

Branch content.

mpoquet commented 5 years ago

I'm afraid this PR will not be accepted as is, as it breaks some tests and I am not sure prohibiting this kind of simulation pattern is desirable :(.

On both my machine and the CI (log), the energy-minimal tests do not pass. These tests usually finish with with a warning ([master_host:server:(2) 261.520000] ${BATSIM}/src/server.cpp:147: [server/WARNING] Left simulation loop, but some waiter processes (used to manage the CALL_ME_LATER message) are running.), but it looks like they are asserted out now.

The algorithm in these tests (batsched's energy_bf_idle_sleeper) is periodic: It sends CALL_ME_LATER from the first job release to the end of the simulation (SIMULATION_ENDS) every T seconds. The algorithm could stop sending those CALL_ME_LATER requests as soon as the no_more_static_job_to_submit NOTIFY has been received but this is not desired here, as we want to see the impact of the periodic energy decisions when the queue is empty at the end of the simulation.

Can you detail why you needed this change? Maybe the termination can be improved in the protocol to fit the existing use cases and yours.

Best, Millian

mpoquet commented 5 years ago

Quick update: @Mommessc is working on a simplification of Batsim's termination code right now (which includes forcing waiter processes to finish) and I updated Batsched's energy algorithms so their termination is cleanier.

mpoquet commented 5 years ago

The recent merge in 9cc51b9 should solve this issue. Thanks for pointing out that this part of the code was not clean!

Mommessc commented 5 years ago

Actually I cleaned up an unused boolean in 849ab4a. To recap the changes: now the SIMULATION_ENDS is sent alone in a message and waits for waiter processes to finish. Hope these changes include what you intended to do with this PR!

lccasagrande commented 5 years ago

I'm sorry I did not properly test it.

I'd a problem with Batsim when the simulation finishes before answering my last "CALL_ME_LATER". I was expecting that Batsim would answer all my calls before finishing.

This simple PR had solved my problem. Thanks for solving it.