Integrate local model support via llama-cpp-python

zydxt / sd-webui-rpg-diffusionmaster

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

GNU Affero General Public License v3.0

53 stars 2 forks source link

Integrate local model support via llama-cpp-python #13

Closed BetaDoggo closed 2 months ago

BetaDoggo commented 3 months ago

This pr adds support for local models via the llama-cpp-python library. This installs llama-cpp-python as a dependency, if cuda is available on first launch it will install with cuda support, otherwise the cpu only version will be used. The local model option is available in the UI, the path of the gguf file must be entered into the ui by the user as well as the number of layers to offload.

I chose llama-cpp over other libraries because it has the best cpu offloading and an openai compatible completions function. For pure speed exllamav2 would have been better but far less accessible.

Support for the other gpu acceleration methods available in llama-cpp could probably be added but I don't have the hardware to test them.

zydxt commented 2 months ago

Thank you for your PR. I encountered a Segmentation fault issue while testing your implementation, and I'm not sure if it's due to my device or another reason. I implemented a solution based on huggingface transformers. But because the version of the transformer library in the webui is quite old, my implementation cannot support popular models like yours. I still hope to adopt your solution after some testing.

BetaDoggo commented 2 months ago

Thanks for taking a look at it. I think the segfault you got could be related to this issue from the llama-cpp-python repo. I also get a segfault when using version 0.2.58 as described in the issue. It seems to be fixed for me in the newer versions but some users are still reporting issues. I've pinned the llama-cpp-python version at 0.2.56 now which seems to be the last working version for most users. If you still have one of the broken versions you'll have to delete it from your venv. I hope this helps.

zydxt commented 2 months ago

Hi, sorry for late response. I tested the pin version llama commit and it works well for me. No Segmentation fault at all. Thank your so much for your PR!