oar-team / batsim

Batsim: Infrastructure simulator for job and I/O scheduling
GNU Lesser General Public License v3.0
30 stars 15 forks source link

Unpredictable test crashes #15

Closed mpoquet closed 7 years ago

mpoquet commented 7 years ago

Since ZeroMQ commit d8a26cdae, tests seems to be not so deterministic. Various commits worked fine on my laptop, failed on the CI but worked when I retried to run them on the CI...

The problem might be in Batsim, in pybatsim or in the exec1 and execN experiment scripts. I suspect some TCP management should be done in execN before running the scheduler (making sure that the port is not being used).

mpoquet commented 7 years ago

Commit c5111bc restored the previous exec1 behaviour, which consisted in waiting the for socket to be usable before running the Batsim and Sched processes.

execN_ssh4 might still cause issues.

mpoquet commented 7 years ago

Exec1 and ExecN are now based on asyncio rather than execo. It may fix the issue.

mpoquet commented 7 years ago

Issue seems solved, closing the thread.