Open clin99 opened 6 years ago
I am also unable to remove the experiment from the queue.
It seems like the thread responsible for terminating experiments that take too much time was canceled. I restarted the controller.
I created a log file for further analysis.
Another one:
The experiment http://w3id.org/hobbit/experiments#1598593574572 took too much time. Forcing termination.
It looks like the platform initiated the termination of the benchmark service, but not the system service.
Probably the thing that got stuck is listTasks
in the Docker API library:
"Timer-0" #35 prio=5 os_prio=0 cpu=571.76ms elapsed=15419.73s tid=0x00007ff6c4876800 nid=0x46 waiting on condition [0x00007ff60a4ad000]
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.5/AbstractQueuedSynchronizer.java:885)
- parking to wait for <0x000000060c935200> (a jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.5/AbstractQueuedSynchronizer.java:1345)
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.5/AbstractQueuedSynchronizer.java:1039)
at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at com.spotify.docker.client.DefaultDockerClient.version(DefaultDockerClient.java:517)
at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2639)
at com.spotify.docker.client.DefaultDockerClient.listTasks(DefaultDockerClient.java:2002)
at org.hobbit.controller.docker.ContainerManagerImpl.getContainerExitCode(ContainerManagerImpl.java:702)
at com.spotify.docker.client.DefaultDockerClient.assertApiVersionIsAbove(DefaultDockerClient.java:2772)
at org.hobbit.controller.docker.ContainerStateObserverImpl$1.run(ContainerStateObserverImpl.java:97)
at java.util.TimerThread.run(java.base@11.0.5/Timer.java:506)
at java.util.TimerThread.mainLoop(java.base@11.0.5/Timer.java:556)
An experiment keeps running even the max runtime becomes negative. Is this normal?