mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.65k stars 1.66k forks source link

diffusers: add esrgan #917

Open mudler opened 10 months ago

mudler commented 10 months ago

Add esrgan support to enhance image output size.

https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler

localai-bot commented 10 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To add ESRGAN support for enhancing the image output size, you can use the following steps:

  1. Install the StableDiffusionXL model from the huggingface transformers library using the pip install command in your terminal:
    pip install transformers==4.0.0
  2. Add the ESRGAN backend to the LocalAI configuration file in the models directory, under the diffusers section:
    # Use ESRGAN for image output size enhancement
    diffusers:
    pipeline_type: StableDiffusionXLPipeline
    cuda: false # Enable for GPU usage (CUDA)
    scheduler_type: EulerAncestralDiscreteScheduler
    esrgan: true
  3. Use the esrgan parameter in the image generation request to enable the output size enhancement:
    curl http://localhost:8080/v1/images/generations \
    -H "Content-Type: application/json" \
    -d '{
      "prompt": "<positive prompt>|<negative prompt>", 
      "model": "animagine-xl", 
      "step": 51,
      "size": "1024x1024",
      "esrgan": true
    }'

    This will generate an image with enhanced output size using the ESRGAN backend.

Sources:

LeonSijiaLu commented 2 months ago

Hello @mudler , I would want to try taking on this ticket.

Before I begin, I wanted to clarify my understandings. I think the primary goal is to update Diffusers to take on an additional boolean field Upscaler. If this field is set to true, we'll update the python server to utilize the 'StableDiffusionUpscalePipeline'. Please correct me if you think I misunderstood.