Add model cache to speed up

lllyasviel / Fooocus

Focus on prompting and generating

GNU General Public License v3.0

41.28k stars 5.83k forks source link

Add model cache to speed up #2198

Open MindOfMatter opened 9 months ago

MindOfMatter commented 9 months ago

https://github.com/MindOfMatter/Fooocus-MindOfMatter-Edition/pull/19

MindOfMatter commented 9 months ago

I just created a fork fooocus project version just synced with the original project in main branch https://github.com/MindOfMatter/Fooocus-MindOfMatter-Edition

I created a pr in my fork project to show what we need to change to enable the feature

https://github.com/MindOfMatter/Fooocus-MindOfMatter-Edition/pull/19

If you want to test it, you need to replace your fooocus local folder with mine (according to selected feature branch)

Also my dev branch contains all merged tested pr features in my fork project https://github.com/MindOfMatter/Fooocus-MindOfMatter-Edition/tree/dev

I hope that it could help you :)

Have a nice day

mashb1t commented 9 months ago

--always-high-vram or --always-gpu serve the same issue, except they will actually keep the model loaded in VRAM, right? Default is to put a model on the offload device, which is CPU in most cases, so storing it in memory should already be the default. As far as i could see the model cache global var will keep every model loaded, so when having multiple models "cached" menory eill not be freed and when multiple users use multiple models the system can be overwhelmed, right? What does this solution significantly differ in compared to the mentioned approaches?

MindOfMatter commented 9 months ago

According to my tests, the cache especially helps (like me), if you have a lot of ram (32 gb for my part), I have not tested with these arguments thank you for sharing it with me :)

According to my hardware configurations, I haven't had any problems with it: on the contrary, the load is much faster because my ram is bigger than my vram (8 gb). I tested with 10 SDXL models of 6 gb.

It all depends on the hardware configuration.

(IIt will be useful when switching several stored models, as in the case where you change often to test if the model is better as a refiner or base, or change its switch value or change according to several available refiners)

https://github.com/MindOfMatter/Fooocus-MindOfMatter-Edition/pull/19/files

mashb1t commented 9 months ago

I can imagine that this will lead to issues when you don't have sufficient memory to load new models to the list and keep it in memory. Would you please also test this scenario? Thanks!

MindOfMatter commented 9 months ago

Ok, I going to test with all my (16) models as refiner (because I have some SD modeld)

And I will share my performance graph for the last one

MindOfMatter commented 9 months ago

My result :

Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz NVIDIA GeForce RTX 3070 Ti

MindOfMatter commented 9 months ago

Then, if you already load the model into the cache, even it is already changed (not in the last vram), generation starts immediately.

Please press picture to start gif demo : chrome_raf3gUHuUV

mashb1t commented 8 months ago

Just tested again, if it works it works, but there are obvious downsides. This only works well with high available memory, as when too much memory is used, the OS outsources to swap, which is slow and makes the system unresponsive. After outsourcing to swap the opposite of your intended behavior can be observed, every model access is slow. I've had my OS freeze for ~10s each time when switching, even with a 980 Pro NVMe M.2 SSD as swap.

I don't think this is generally applicable, but i appreciate the config option. Would you mind creating an actual PR to add proper attribution?