kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 107 forks source link

[Feature Request]: WebAPI #22

Closed kabachuha closed 1 year ago

kabachuha commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

Such feature definitely needs its own REST API for other communications to interact with it, so it would be useful as a part of video generating services, such as Discord bots

Proposed workflow

  1. Make an app which is able to send REST API requests
  2. Send a request
  3. It's processed by the auto plugin
  4. The result is sent back, or if it fails, an error message is sent instead

Additional information

No response

Straafe commented 1 year ago

+1 to this. I have a fully featured SD bot that uses A1111's API as a backend (https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API) , so an API for this would allow me to add it as a feature easily to that bot.

simple6502 commented 1 year ago

+1 to this as well. I have yet to find an integration between modelscope and an API, as would be very nice for a discord bot I am working on as well.

Straafe commented 1 year ago

@simple6502 @kabachuha I wrote a basic proof of concept for this that works. It only works with text2vid and only asks for the prompt for the user (uses defaults for all other parameters), but it does work. In the below image you can see the new api is available in the A1111 api's and it returns the resulting mp4 file as base64/utf-8.

example of api working: image

you can see the result is indeed the video: image

If someone wants to flesh it out and do a pr, be my guest, but if you just want to bang it in right now, here is what I did:

add api.py to the extension's scripts directory:

import base64
import functools
import hashlib
import io
import json
import logging
import os
import sys
import shutil
import traceback
import zipfile
from types import SimpleNamespace
from PIL import Image
from fastapi import FastAPI, Response, Query, Body, Form, Header
from fastapi.encoders import jsonable_encoder
from fastapi.exceptions import RequestValidationError
from fastapi.responses import JSONResponse, StreamingResponse, FileResponse
from pydantic import BaseModel, Field
from starlette import status
from starlette.requests import Request

current_directory = os.path.dirname(os.path.abspath(__file__))
if current_directory not in sys.path:
    sys.path.append(current_directory)

import text2vid
from scripts.video_audio_utils import find_ffmpeg_binary

logger = logging.getLogger(__name__)

def t2v_api(_, app: FastAPI):
    logger.debug("Loading T2V API Endpoints.")
    @app.exception_handler(RequestValidationError)
    async def validation_exception_handler(request: Request, exc: RequestValidationError):
        return JSONResponse(
            status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
            content=jsonable_encoder({"detail": exc.errors(), "body": exc.body}),
        )

    @app.get("/t2v/run")
    async def t2v_run(
            prompt: str = Query("", description="prompt", )
    ):
        """
        Run t2v over api
        @return:
        """
        dv = SimpleNamespace(**text2vid.DeforumOutputArgs())
        videodat = text2vid.process(dv.skip_video_creation, find_ffmpeg_binary(), dv.ffmpeg_crf, 'slow', dv.fps, dv.add_soundtrack, dv.soundtrack_path, prompt, 'text, watermark, copyright, blurry', 30, 24, -1, 7, 256, 256, 0, '', '', '' ,'', -1, '', '', '' ,'', '',)

        return JSONResponse(content={"mp4": videodat})

try:
    import modules.script_callbacks as script_callbacks

    script_callbacks.on_app_started(t2v_api)
    logger.debug("SD-Webui API layer loaded XXX")
except:
    logger.debug("Unable to import script callbacks.XXX")
    pass

Edit the existing script text2vid.py a bit so the "process" functions return the video data. As an example, this is what I changed the process functions to return:

b64encode(mp4).decode("utf-8")

Now the new api can run the process function and return to you the video data.

If someone spent some more time on it they could add an api for text2vid and vid2vid, and all the other parameters to the api's as well, but all I really needed to try to do was let the user input the prompt and get a quick video.

kabachuha commented 1 year ago

@Straafe TYSM for your contribution!

I'll look into that once I'll have a little less studying load (hopefully within a day, but not sure)