Open jozsefszalma opened 1 week ago
@jozsefszalma I agree; adding this would be beneficial. The only thing I would suggest is to add a way to keep the model loaded in case someone needs it for responsiveness. I’d set it up so that if the variable is set to -1, the model stays loaded indefinitely.
FYI this a tad more complex than I initially expected; the melotts code loads some additional tensors during inferencing that are not cleaned up afterwards. So even if I move the model to cpu, then delete the model object, empty the torch cache and do a garbage collect there is still ~1.2 GB VRAM consumed.
I guess I will raise the topic with them and see.
Hey Tim,
as far as I can tell the model hangs around in GPU memory indefinitely, which might be undesirable for certain use-cases. I could submit a PR that introduces a timeout env variable (defaulting to 15min if not set), something like this (not yet tested):