hobbit-project / platform

HOBBIT benchmarking platform
GNU General Public License v2.0
23 stars 9 forks source link

Experiment is not terminated after time is out #252

Open clin99 opened 6 years ago

clin99 commented 6 years ago

An experiment keeps running even the max runtime becomes negative. Is this normal?

vpapako commented 6 years ago

I am also unable to remove the experiment from the queue.

MichaelRoeder commented 6 years ago

It seems like the thread responsible for terminating experiments that take too much time was canceled. I restarted the controller.

I created a log file for further analysis.

denkv commented 4 years ago

Another one: The experiment http://w3id.org/hobbit/experiments#1598593574572 took too much time. Forcing termination.

It looks like the platform initiated the termination of the benchmark service, but not the system service.

Probably the thing that got stuck is listTasks in the Docker API library:

"Timer-0" #35 prio=5 os_prio=0 cpu=571.76ms elapsed=15419.73s tid=0x00007ff6c4876800 nid=0x46 waiting on condition  [0x00007ff60a4ad000]
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.5/AbstractQueuedSynchronizer.java:885)
    - parking to wait for  <0x000000060c935200> (a jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.5/AbstractQueuedSynchronizer.java:1345)
    at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.5/AbstractQueuedSynchronizer.java:1039)
    at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at com.spotify.docker.client.DefaultDockerClient.version(DefaultDockerClient.java:517)
    at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2639)
    at com.spotify.docker.client.DefaultDockerClient.listTasks(DefaultDockerClient.java:2002)
    at org.hobbit.controller.docker.ContainerManagerImpl.getContainerExitCode(ContainerManagerImpl.java:702)
    at com.spotify.docker.client.DefaultDockerClient.assertApiVersionIsAbove(DefaultDockerClient.java:2772)
    at org.hobbit.controller.docker.ContainerStateObserverImpl$1.run(ContainerStateObserverImpl.java:97)
    at java.util.TimerThread.run(java.base@11.0.5/Timer.java:506)
    at java.util.TimerThread.mainLoop(java.base@11.0.5/Timer.java:556)

https://github.com/hobbit-project/platform/blob/v2.0.12/platform-controller/src/main/java/org/hobbit/controller/docker/ContainerManagerImpl.java#L702