insitro / redun

Yet another redundant workflow engine
https://insitro.github.io/redun/
Apache License 2.0
513 stars 43 forks source link

Docker executor: No such container #35

Closed ricomnl closed 2 years ago

ricomnl commented 2 years ago

To reproduce: I cloned the latest state of the redun repository and installed it via pip install -e. Then I executed the following:

cd examples/docker
cp ../05_aws_batch/data.tsv .
cd docker
make setup
make build
cd ..

I also added a docker executor to the .redun/redun.ini file in the docker example folder:

# redun configuration.

[backend]
db_uri = sqlite:///redun.db

[executors.default]
type = local
max_workers = 20

[executors.docker]
type = docker
image = redun_example
scratch = scratch

Upon running redun run workflow.py main I encountered the following error:

[redun] Executor[docker]: submit redun job b4be464d-aa88-499b-9946-4404a3481bc8 as Docker container 89c03051ce785094202c556bfc9f619e2c005eaf9f16a8355bdbff684e955cc4:
[redun]   container_id = 89c03051ce785094202c556bfc9f619e2c005eaf9f16a8355bdbff684e955cc4
[redun]   scratch_path = /Users/ricomeinl/Desktop/retro/redun/examples/docker/.redun/scratch/jobs/9b358c2bab1d7db8c92811b1c7ef53fac23209fe
[redun] 
[redun] *** Workflow error
[redun] 
[redun] | JOB STATUS 2022/05/28 15:56:51
[redun] | TASK                                         PENDING RUNNING  FAILED  CACHED    DONE   TOTAL
[redun] | 
[redun] | ALL                                                1       5       0       0       0       6
[redun] | redun.examples.docker.count_colors_by_script       0       1       0       0       0       1
[redun] | redun.examples.docker.main                         0       1       0       0       0       1
[redun] | redun.examples.docker.task_on_docker               0       1       0       0       0       1
[redun] | redun.postprocess_script                           1       0       0       0       0       1
[redun] | redun.script                                       0       1       0       0       0       1
[redun] | redun.script_task                                  0       1       0       0       0       1
[redun] 
[redun] Execution duration: 2.11 seconds
Error: No such container: 5e83c5b505f4fbd4a53850ff44b556b409337675d4a165a88ad89614568e8125
[redun] *** Execution failed. Traceback (most recent task last):
[redun]   File "/Users/ricomeinl/Desktop/retro/redun/redun/executors/docker.py", line 347, in _monitor
[redun]     for job in jobs:
[redun]   File "/Users/ricomeinl/Desktop/retro/redun/redun/executors/docker.py", line 237, in iter_job_status
[redun]     logs = subprocess.check_output(["docker", "logs", job_id]).decode("utf8")
[redun]   File "/Users/ricomeinl/.pyenv/versions/3.8.13/lib/python3.8/subprocess.py", line 415, in check_output
[redun]     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
[redun]   File "/Users/ricomeinl/.pyenv/versions/3.8.13/lib/python3.8/subprocess.py", line 516, in run
[redun]     raise CalledProcessError(retcode, process.args,
[redun] CalledProcessError: Command '['docker', 'logs', '5e83c5b505f4fbd4a53850ff44b556b409337675d4a165a88ad89614568e8125']' returned non-zero exit status 1.
mattrasmus commented 2 years ago

Thanks for reporting. I think there is a small regression in how local docker containers were cleaned up. I have a porposed fix in #36.

mattrasmus commented 2 years ago

@ricomnl When you get a chance, can you confirm if the latest main branch solves this issue? Thanks again for reporting.

ricomnl commented 2 years ago

Yes, that did indeed solve it! Sorry for the delay on my end. Thanks for the prompt response on this!!