zocker-160 / sheepit-docker

A docker container for the SheepIt! render farm with CUDA support
https://www.sheepit-renderfarm.com/
6 stars 2 forks source link

improvment - add a healthcheck #4

Open bugaenkoleonid opened 10 months ago

bugaenkoleonid commented 10 months ago

im using your new docker tag for deployment on vast.ai. Thank you so much for that, it works pretty well!

but sometimes agents hang up and freeze the entire render queue

Maybe you can use this solution in your image?

zocker-160 commented 10 months ago

Sound like a good addition, I will look into it.

However it might also be useful to investigate why / where / when it is freezing, because normally this should be fixed in the sheepit client itself. I do not know if the sheepit client has a health check also, but I would argue that it would be its job to detect a stall and do something about it.

DaCoolX commented 10 months ago

If I recall correctly, that image does its healthcheck by watching the logs and maybe comparing timestamps.

Thing is, on older, community-driven Sheepit docker images, this tended to happen, I know because I run one constantly myself.

On the official image this does not happen anymore.

Should also work with modern docker CUDA support, if it's needed.

Anywho, a native healthcheck in the client would probably be wise to implement. Just watching logs seems primitive and error-prone.

zocker-160 commented 10 months ago

It is a client issue, had this happen natively too when running for a long time, not sure why you think this would be docker related, because it isn't.

A health check makes perfect sense in an environment where you pay per minute, so it is a very reasonable request.

DaCoolX commented 10 months ago

Just a hunch, used to happen every few months but it does not anymore.

I will see to it to get a health signal natively implemented in the client.

bugaenkoleonid commented 10 months ago

I couldn't run the official image on vast.ai

DaCoolX commented 10 months ago

I don't wanna clutter this issue with conversation of a separate docker image. If you want, our GitLab Issue Tracker You can also reach out to me on Discord via @dacool