statisticalbiotechnology / quandenser-pipeline

A nextflow/singularity pipeline for quandenser
Apache License 2.0
5 stars 1 forks source link

Running jobs says "Failed" even though the pipeline is still running #21

Closed TimothyOlsson closed 5 years ago

TimothyOlsson commented 5 years ago

In some cases, the pipeline shows "Failed" even though the process is still running. It seems that the "running jobs" tab sometimes cannot find the PID process (even though it could easily be checked with ps aux | grep PID), which makes the GUI set the job as "Failed". It could either be some kind of sync problem, where stdout.txt has not been created yet and the GUI not finding the PID process, at the same time the "running job" tab updates. This is a minor problem, since it does not impact the process itself, but should be investigated since it prevents the user from killing the job via the GUI.

TimothyOlsson commented 5 years ago

Further test shows that the problem could be because of which Singularity version is used. When running the PID checker in singularity versions <v3.2, aka ps ux | grep PID in the shell, you get all process information for the user, but for v3.4, when you run the exact same command, you don't get the process information of the host.

This will require either one of two solutions:

Edit: Running ps aux in version v3.4 yields no processes in the host computer

TimothyOlsson commented 5 years ago

"Fixed" in https://github.com/statisticalbiotechnology/quandenser-pipeline/commit/25419aa87317ad1e8052dfad90ad9b64548964fb

The shell script now install singularity v3.2.1 instead of the latest branch (currently v3.4.0). Version v3.2.1 allows for interacting with the host process, thus being able to kill and check running processes