Closed knorr3 closed 2 years ago
@knorr3 Could you try adding timeout_ready: -1
config (like in dalle) to diffusion executor in flow.yml
?
Currently trying that, thank you. But what could diffusion take so long? I thought that dalle would take ~8 mins to start up because of the download.
I am not sure also.
If you run using docker, it is already downloaded diffusion model weights, so it shouldn't take so long. Maybe there is still something downloading from the internet.
To be honest, i don't think this is the problem. I am currently at 45 minutes waiting time. The download speed should be fast enough with ~20 MByte/s :-).
you can try to start only diffusion by commenting out other executors and not relevant things in flow.yml
.
This way at least you'll have less clutter on the console. Maybe catch some error log.
Did that, but there's not a sinlge error message. Only some DeprecationWarnings about PIL.Nearest.
Found the issue. Jina requires a lot of memory (not VRAM) and ran into a OOM. Somehow OpenShift didn't tell me this, so i searched for the error in the wrong place. I increased the memory from 4GB to 32GB and now it starts.
Hello, I am trying to run the dalle-flow server container on OpenShift with an Nvidia A40 GPU. After waiting 10 minutes for dalle and diffusion, the diffusion executer terminates with a timeout. I have tried the latest container image and also tried installing the latest version manually inside the container image. Another problem is that there is not a single error or debug message coming from the diffusion executer.
Does anyone have the same problem, or any idea how I can debug this? Thank you very much.