zydxt / sd-webui-rpg-diffusionmaster

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
GNU Affero General Public License v3.0
53 stars 2 forks source link

Please support using local LLM #10

Closed wardensc2 closed 2 months ago

wardensc2 commented 4 months ago

Hi Zydxt

Please add support using local LLM since using GPT4 api or Gemini is a bit expensive due to their cost, I can't wait to test with local LLM model to generate high res images

Thank you so much

zydxt commented 4 months ago

Of course. I will try my best. :smile:

wardensc2 commented 4 months ago

Of course. I will try my best. 😄

Thank you so much for take time answering my request. Actually I already got a free Gemini API and try to run it. It's running great, it divides region as Gemini calculated and separate prompt with BREAK. However when I hit the generate button the whole SD just stop not any step running. No warning, no crash, I can alt+tab to another software at ease.

Not sure it's because my API is free Gemini or not because in order to make Gemini calculte prompt I have to wait about 5-15 minutes each request or your extension will report no Prompt receive from API.

Just check again:

If I using complex object, it's worked. But if i use multi atributte, it stops running. In order to fully test this extension Local LLM is the best due to quota of online API is limited

Heather95 commented 4 months ago

@wardensc2 I use the free version Gemini Pro API with multi-attribute or complex object and haven't encountered this issue. It takes me around 45 seconds to generate one image in SDXL Pony model with an RTX 3080 and 16GB RAM. Sadly, the Gemini API heavily censors any content it thinks is a little violent or NSFW, sometimes with false positives, and will throw an error before generation begins. This censorship behavior cannot be changed without altering the code (perhaps I should open an issue about this?) @zydxt

Other than this issue with Gemini which renders RPG DiffusionMaster unusable for me due to my... preferences, there should be no issue with the free quota limit, which is 60 requests per minute. If anyone can generate 1 image per second with this project, then I applaud their government supercomputer. (I'm just teasing you here, wardensc.) It makes for a fair alternative until this project supports local LLMs.

However, if you were getting flags for censorship, then you would see "Error" text everywhere. I'm not sure what to recommend here, but you could try to either disable all your extensions except RPG and RegionalPrompter, make certain RP and WebUI are up to date, or reinstall A1111.

wardensc2 commented 4 months ago

@wardensc2 I use the free version Gemini Pro API with multi-attribute or complex object and haven't encountered this issue. It takes me around 45 seconds to generate one image in SDXL Pony model with an RTX 3080 and 16GB RAM. Sadly, the Gemini API heavily censors any content it thinks is a little violent or NSFW, sometimes with false positives, and will throw an error before generation begins. This censorship behavior cannot be changed without altering the code (perhaps I should open an issue about this?) @zydxt

Other than this issue with Gemini which renders RPG DiffusionMaster unusable for me due to my... preferences, there should be no issue with the free quota limit, which is 60 requests per minute. If anyone can generate 1 image per second with this project, then I applaud their government supercomputer. (I'm just teasing you here, wardensc.) It makes for a fair alternative until this project supports local LLMs.

However, if you were getting flags for censorship, then you would see "Error" text everywhere. I'm not sure what to recommend here, but you could try to either disable all your extensions except RPG and RegionalPrompter, make certain RP and WebUI are up to date, or reinstall A1111.

Yes maybe in my dynamic prompt have something about sensitive word such as body shape, breast shape, underwear or something. I guess we have to wait Zydxt to update the extension to use local LLM which have more freedom to generate words

Heather95 commented 4 months ago

Yes maybe in my dynamic prompt have something about sensitive word such as body shape, breast shape, underwear or something. I guess we have to wait Zydxt to update the extension to use local LLM which have more freedom to generate words

Yep, that's the issue here. Gemini detected censored content and canceled your image generation. I opened a new issue about this, which you can subscribe to here in case any updates come through: zydxt/sd-webui-rpg-diffusionmaster#12

BetaDoggo commented 3 months ago

I hacked together some basic local model support using llama-cpp-python. Honestly I don't think it's worth using local models for this unless you can push at minimum 50+ tokens per second which not a ton of people can do. Though Gemini pro is heavily censored it's free and relatively fast.

repo is here: https://github.com/BetaDoggo/sd-webui-rpg-diffusionmaster-local For gpu support you need to compile llama-cpp-python with cuda. I'm too gpu poor to test it properly but it should work.

Heather95 commented 3 months ago

I hacked together some basic local model support using llama-cpp-python. Honestly I don't think it's worth using local models for this unless you can push at minimum 50+ tokens per second which not a ton of people can do. Though Gemini pro is heavily censored it's free and relatively fast.

@BetaDoggo Thanks for that, maybe you could create a PR. To be fair, we can avoid the Gemini censorship issue by using softer/nicer words and editing the output afterwards. For example, with a fight scene, type in "Bob argued with Adam" instead of "Bob threw a punch at Adam" then edit the prompt it gives you to correct it. For ""other"" NSFW scenes, simply say the characters were hugging. Far from perfect, but it works fine for testing what seems like an early version of this RPG diffusionmaster.

For anyone who's okay with inpainting, Fooocus has a similar solution to RPG. Simply enable Advanced in Fooocus and go to the Style tab to choose which type of prompt enhancement you would like. Create your base image, move to the Inpaint tab, choose your method, and inpaint each new region one at a time. Sorry that I went on a tangent here, but I hope all of this helps somebody out.

zydxt commented 2 months ago

sorry guys. I was struggling with the transformer for local llm. Could you please try the local-llm branch as well as @BetaDoggo 's llama-cpp solution and then give some feedback about those two ways? I am having issues in these two implementations

zydxt commented 2 months ago

Hey guy. I've merged the local llm PR from @BetaDoggo . Have fun!