mrhan1993 / Fooocus-API

FastAPI powered API for Fooocus
GNU General Public License v3.0
508 stars 135 forks source link

when runing in docker style, gpu memory always increased, running ... then oom killed to restart #245

Closed PeakLee closed 3 months ago

PeakLee commented 3 months ago

image

deploy fooocus-api online, periodly restart !!

fooocus-api on memory management need to improve

mrhan1993 commented 3 months ago

Which version are you using now?

PeakLee commented 3 months ago

@mrhan1993

v0.3.29

when restart the pod, memory usage progress bar reset to 1%, then process text2img and img2img request, a short time later, it shows 95%

docker command: sudo docker run --restart always -d -e TZ=Asia/Shanghai -v /data/model_sync:/mnt --name fooocus-v329-cn --cpus 2.5 --gpus '"device=4"' fooocus-v329-v2 python main.py

hope those be helpful! thanks

mrhan1993 commented 3 months ago

I have generated dozens of pictures in succession with the latest version, and there is basically no fluctuation in memory usage. Maybe you can try the latest version.

PeakLee commented 3 months ago

if Fooocus-API always load the same model and lora related files, it works well as expected,
but when load different model and lora file per request, it will increase memory usage, and hold all in memory till OOM !! @mrhan1993 help ~~

mrhan1993 commented 3 months ago

ok,I will do more test

PeakLee commented 3 months ago

just append the option: "-e PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.8“ in docker run command, but not sure it's ok or not, could you give me some suggestions to avoid gpu OOM? really appreciated!! @mrhan1993

mrhan1993 commented 3 months ago

In another branch, FooocusIntegration , I rewrote the task system and after initial testing, Neither memory OOM nor GPU memory OOM will occur. But this branch has not been fully tested, if you are interested, you can deploy it locally for testing.

PeakLee commented 3 months ago

in version v0.3.29, i added codes in file fooocusapi/worker.py below the codes : print(f'Generating and saving time: {execution_time:.2f} seconds')

appended, then it works well now !

print('--memory stats--:', model_management.get_free_memory(torch_free_too=True)) model_management.cleanup_models() # key1 model_management.soft_empty_cache() # key2
print('--memory stats--:', model_management.get_free_memory(torch_free_too=True))

really appreciated and thanks a lot ! @mrhan1993

mrhan1993 commented 3 months ago

thx, after a while, I will update it