Open smirnp opened 5 years ago
The platform also hangs up for ~10minutes (when GUI says "Failed Unable to remove experiment 1541700297545") having the following logs:
2018-11-08 19:18:51,279 ERROR [org.hobbit.controller.ExperimentManager] - <The experiment http://w3id.org/hobbit/experiments#1541700297545 was stopped by the user. Forcing termination.>
2018-11-08 19:18:51,311 WARN [org.hobbit.controller.docker.ContainerManagerImpl] - <Container for task qpduw750i1dmv8y3nvaxba4b1 has no exit code, assuming 0>
2018-11-08 19:18:51,311 INFO [org.hobbit.controller.docker.ContainerManagerImpl] - <Removing service of container with task id qpduw750i1dmv8y3nvaxba4b1. >
2018-11-08 19:18:51,645 WARN [org.hobbit.controller.docker.ContainerManagerImpl] - <Container for task js0fptg3fcmamw29hfbxsb4j0 has no exit code, assuming 0>
2018-11-08 19:18:51,646 INFO [org.hobbit.controller.docker.ContainerManagerImpl] - <Removing service of container with task id js0fptg3fcmamw29hfbxsb4j0. >
2018-11-08 19:18:51,946 WARN [org.hobbit.controller.docker.ContainerManagerImpl] - <Container for task m25536ri7s46d7scbj93m2cnz has no exit code, assuming 0>
2018-11-08 19:18:51,947 INFO [org.hobbit.controller.docker.ContainerManagerImpl] - <Removing service of container with task id m25536ri7s46d7scbj93m2cnz. >
2018-11-08 19:18:52,237 WARN [org.hobbit.controller.docker.ContainerManagerImpl] - <Container for task q9qqxld9lcvyy498g42p865pg has no exit code, assuming 0>
2018-11-08 19:18:52,238 INFO [org.hobbit.controller.docker.ContainerManagerImpl] - <Removing service of container with task id q9qqxld9lcvyy498g42p865pg. >
2018-11-08 19:18:52,535 WARN [org.hobbit.controller.docker.ContainerManagerImpl] - <Container for task v76rglb9xjcq9xk39k2x0weoy has no exit code, assuming 0>
2018-11-08 19:18:52,535 INFO [org.hobbit.controller.docker.ContainerManagerImpl] - <Removing service of container with task id v76rglb9xjcq9xk39k2x0weoy. >
2018-11-08 19:19:06,266 INFO [org.hobbit.controller.docker.ContainerStateObserverImpl] - <Couldn't get the status of container js0fptg3fcmamw29hfbxsb4j0. Assuming it was stopped by the platform.>
2018-11-08 19:19:06,266 INFO [org.hobbit.controller.PlatformController] - <Container js0fptg3fcmamw29hfbxsb4j0 stopped with exitCode=137>
2018-11-08 19:19:06,267 INFO [org.hobbit.controller.ExperimentManager] - <Sending broadcast message...>
2018-11-08 19:19:06,328 INFO [org.hobbit.controller.ExperimentManager] - <Unknown container js0fptg3fcmamw29hfbxsb4j0 stopped with exitCode=137>
2018-11-08 19:19:06,402 WARN [org.hobbit.controller.docker.ContainerManagerImpl] - <Couldn't remove container js0fptg3fcmamw29hfbxsb4j0 because it doesn't exist>
The mentioned containers crashed by their own before I cancel the experiment in GUI.
Seems that experimentTimeout is not flushing after this and the platform is not checking queue for new experiments for ~10 minutes.
Hi!
This error I got then was failed to cancel the experiment: