s0md3v / sd-webui-roop

roop extension for StableDiffusion web-ui
GNU Affero General Public License v3.0
3.4k stars 884 forks source link

Use Python script to automate Stable Diffusion WebUI img2img process with Roop extension enabled #200

Open shivanraptor opened 1 year ago

shivanraptor commented 1 year ago

I would like to create a Python script to automate Stable Diffusion WebUI img2img process with Roop extension enabled.

Input:

Output:

Here is my attempt (given that SD WebUI is already running at http://127.0.0.1:7860:

import json
import base64
import requests
import os
from tqdm import tqdm
import time

stdip = 'http://127.0.0.1:7860' # Default running on localhost
filefolderpath = '/home/myusername/automatic1111/outputdata/' # Directory to store images, modify images to change this. Output images are also in this directory. The directory should contain image and mask folders

def submit_post(url: str, data: dict):
    return requests.post(url, data=json.dumps(data), timeout=10)

def image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read())
    base64_string = encoded_string.decode("utf-8")
    return base64_string

def save_encoded_image(b64_image, output_path):
    # Check if file exists, create if it doesn't exist
    if not os.path.exists(output_path):
        open(output_path, 'wb').close()

    # Open file in binary mode     
    with open(output_path, 'wb') as f:
        f.write(base64.b64decode(b64_image))
        f.flush()
        f.close()
    if name == 'main':
        img2img_url = stdip + '/sdapi/v1/img2img'

    for root, dirs, files in os.walk(filefolderpath + '/image/'):
        for file_name in files:
            if os.path.exists(filefolderpath + '/mask/' + file_name):
                data = {
                    "init_images": [image_to_base64(filefolderpath + '/image/' + file_name)], # Original image address
                    "denoising_strength": 0.45, # Range 0-1, smaller value closer to original image. Larger value more likely to let imagination fly
                    "image_cfg_scale": 0, # I've never seen this either, don't know what it is
                    "mask": image_to_base64(filefolderpath + '/mask/' + file_name), # Mask image address
                    "mask_blur": 5, # I also forgot what this parameter is, it seems to control the strength of feathering the mask
                    "masked content":"original", # I'm not sure if this parameter is correct, maybe need to delete the whole key-value pair, keep it for now
                    "inpainting_mask_invert": 1, # Probably controls what color the masked area is. No need to change, now this state is opaque part black redraw, white no operation
                    "initial_noise_multiplier": 0, # Initial noise, I feel the bigger it is, the more unrelated to the original image. Also no need to change
                    #"prompt": "healthy right eye,healthy left eye,white conjunctiva,both eyes are the same size,eyes are sparking,Asian,hypernet:eye_surgery_predictionv3:1", # Positive words, no need to change
                    "prompt": "healthy right eye,healthy left eye,white conjunctiva,both eyes are the same size,eyes are sparking,Asian,", # Positive words, no need to change
                    "negative_prompt": "bad right eye,bad left eye,bad eyes,ugly face,blurry,simple", # Negative words, also no need to change
                    "styles": [], # I've never controlled this parameter, so no need to change if not used
                    "seed": -1, # Initial seed. I feel images with the same seed are similar but not identical. -1 is random
                    "subseed": -1, # Subseed
                    "subseed_strength": 0, # Strength of subseed above
                    "batch_size": 1, # How many images generated each time. Default is 1 here, maybe 4090 can be set bigger?
                    "n_iter": 1, # "n_iter" is usually used to control number of iterations, but I haven't tried it, so just set to 1
                    "steps": 70, # Number of runs, this value can be fine tuned, converging when too high, max 150 in webui, maybe can go higher here?
                    "cfg_scale": 7, # Influence of prompt text on image, usually 5-15, max 30 in webui, can fine tune
                    "width": 848, # Width
                    "height": 544, # Height
                    "restore_faces": False, # Whether to correct faces, for 3D, test later if open or not. Suggest False for now
                    "tiling": False, # Tiling, meaning left and right edges match, top and bottom match. Usually False
                    "eta": 0, # "eta" is a common parameter for controlling learning rate or step size in ML algorithms. But I don't know what it does for image generation, so just set to 0
                    "script_args": [], # Parameters I haven't tried, keep this empty list
                    "sampler_index": "DPM++ 2M Karras", # Sampling method, recommend DPM++ 2M Karras, good quality and fast. Can fine tune
                    "resize_mode": 1, # Resize mode, stretch, crop, pad, scale (upsample latent), recommend crop, don't know which one 1 refers to here, can reselect later
                    "inpainting_fill": 1, # Don't know, keep all 3 inpaint params unchanged
                    "inpaint_full_res": True, # Don't know, keep all 3 inpaint params unchanged
                    "inpaint_full_res_padding": 4, # Don't know, keep all 3 inpaint params unchanged
                    "override_settings": {"sd_model_checkpoint": "photon_v1.safetensors",},
                    "hypernetwork_model": ["dic_demosaicing.pt"],
                    "script_args": [0,True,True,"hypernetwork_model","dic_demosaicing.pt",1,1],
                    "sd_vae":"Automatic"
                }

response = submit_post(img2img_url, data)
save_image_path = filefolderpath+ file_name
save_encoded_image(response.json()['images'][0], save_image_path)

But how can I set the Roop extension parameters (input image and other settings) in the script? Thanks.

Gourieff commented 1 year ago

Just add in your data: alwayson_scripts": {"roop":{"args":args}} where args - arguments for roop script, you can find them with URL http://127.0.0.1:7860/sdapi/v1/script-info (search for roop there) My API example that works with my fork: https://github.com/Gourieff/sd-webui-reactor/blob/main/example/api_example.py

shivanraptor commented 1 year ago

I am using Gradio public link as I am running in a remote server, it reports for the script-info URL:

{"detail":"Not Found"}

UPDATE: I forget to add --api flag to the launch command. It works fine now.

shivanraptor commented 1 year ago

@Gourieff thanks for your API sample, I can construct my script. However, it seems that the input image to Roop is ignored.

Here is my script:

import base64, io, requests, json
import os
from tqdm import tqdm
from datetime import datetime, date
import time

def image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
        base64_string = encoded_string.decode("utf-8")
    return base64_string

def submit_post(url: str, data: dict):
    return requests.post(url, data=json.dumps(data), timeout=10)

def save_encoded_image(b64_image, output_path):
    # Check if file exists, create if it doesn't exist
    if not os.path.exists(output_path):
        open(output_path, 'wb').close()

    # Open file in binary mode     
    with open(output_path, 'wb') as f:
        f.write(base64.b64decode(b64_image))
        f.flush()
        f.close()

stdip = 'http://127.0.0.1:7860'
base_path = '/home/username/automatic1111/'
file_input = 'input.jpg'                             # the face to swap
file_ref = 'artwork.jpg'                             # the source image
filefolderpath = 'outputdata/' 

time = datetime.now()
today = date.today()
current_date = today.strftime('%Y-%m-%d')
current_time = time.strftime('%H-%M-%S')

file_output = filefolderpath + 'output_' + current_date + '_' + current_time + '.png'

# convert input and reference files to base64
file_input_base64 = image_to_base64(file_input)
file_ref_base64 = image_to_base64(file_ref)

# Roop arguments:
args = [
    file_input_base64,                                                      #0 File Input
    True,                                                                   #1 Enable Roop
    '0',                                                                    #2 Comma separated face number(s)
    base_path + 'stable-diffusion-webui/models/roop/inswapper_128.onnx',    #3 Model
    'CodeFormer',                                                           #4 Restore Face: None; CodeFormer; GFPGAN
    1,                                                                      #5 Restore visibility value
    True,                                                                   #6 Restore face -> Upscale
    'None',                                                                 #7 Upscaler (type 'None' if doesn't need), see full list here: http://127.0.0.1:7860/sdapi/v1/script-info -> roop-ge -> sec.8
    1,                                                                      #8 Upscaler scale value
    1,                                                                      #9 Upscaler visibility (if scale = 1)
    False,                                                                  #10 Swap in source image
    True,                                                                   #11 Swap in generated image
]

data = {
    "init_images": [file_ref_base64], # Original image address
    "denoising_strength": 0.5, # Range 0-1, smaller value closer to original image. Larger value more likely to let imagination fly
    "prompt": "",
    "negative_prompt": "",
    "seed": -1, # Initial seed
    "batch_size": 1, # How many images generated each time
    "n_iter": 1, # number of iterations
    "steps": 70, # Number of runs, this value can be fine tuned, converging when too high, max 150 in webui, maybe can go higher here?
    "cfg_scale": 7, # Influence of prompt text on image, usually 5-15, max 30 in webui, can fine tune
    "width": 512,
    "height": 768,
    "restore_faces": False, # Whether to correct faces, for 3D, test later if open or not. Suggest False for now
    "sampler_name": "DDIM",
    "sampler_index": "DDIM", # or "DPM++ 2M Karras"
    "override_settings": {"sd_model_checkpoint": "realisticVisionV40_v40VAE.safetensors",},
    "alwayson_scripts": {"roop": {"is_img2img": True, "is_alwayson": True, "args": args}}
}

img2img_url = stdip + '/sdapi/v1/img2img'
response = submit_post(img2img_url, data)
try:
    save_encoded_image(response.json()['images'][0], file_output)
except KeyError as ke:
    print(response.json())

In the Web UI, the face is successfully swapped. But in the script, the face is not swapped. What did I miss?

Generated from the Web UI, the logs are as follows:

Running DDIM Sampling with 35 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████████| 35/35 [00:23<00:00,  1.52it/s]
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
2023-08-02 13:59:20,608 - roop - INFO - Restore face with CodeFormer
Total progress: 105it [05:30,  3.15s/it]
Total progress: 105it [05:30,  1.51it/s]

Generated from my script, the log is as follows:

2023-08-02 14:04:32,223 - roop - INFO - roop enabled, face index {0}
2023-08-02 14:04:32,223 - roop - INFO - Swap in source 0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/username/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
2023-08-02 14:04:35,236 - roop - INFO - Restore face with CodeFormer
Running DDIM Sampling with 35 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████████| 35/35 [00:02<00:00, 13.91it/s]
Total progress:  94%|██████████████████████████████████████████████████████████████████    | 34/36 [00:02<00:00, 13.90it/s]
Total progress:  97%|████████████████████████████████████████████████████████████████████  | 35/36 [00:15<00:00, 13.90it/s]
WHUZhangYuhan commented 1 year ago

I also encountered such a problem, and then I compared the example of the author reading the image, and the base64 content of the image I read was inconsistent, I made the modification, and then got the picture after the face change.

import base64 import io from PIL import Imgae

image_file = "path/to/local/image/file" im = Image.open(image_file)

img_bytes = io.BytesIO() im.save(img_bytes, format='PNG') img_base64 = base64.b64encode(img_bytes.getvalue()).decode('utf-8')

shivanraptor commented 1 year ago

@WHUZhangYuhan using my approach to generate base64 encoded image does not require PIL. Do you mean the base64 encoded image using the code in https://github.com/Gourieff/sd-webui-roop-nsfw/blob/main/example/api_example.py is invalid?

WHUZhangYuhan commented 1 year ago

@shivanraptor

def image_to_base64(image_path): with open(image_path, "rb") as image_file: encoded_string = base64.b64encode(image_file.read()) base64_string = encoded_string.decode("utf-8") return base64_string

This method does not work, I used it.

When I switched to the method in the link below, it worked. The results of these two methods of converting base64 are different.

https://github.com/s0md3v/sd-webui-roop/pull/101/commits/e3bbbd1c150d30e6184314b45a37dde2797bbd19

shivanraptor commented 1 year ago

Alright. Back to my issue, I found out that setting #10 Swap in source image to True in my code will generate a successful result.

wuzikai18 commented 1 year ago

Did you succeed?

shivanraptor commented 1 year ago

Did you succeed?

Yes, the above script works after setting the parameter to True. However, I found that Roop can be used WITHOUT Stable diffusion to achieve the same result. See https://huggingface.co/spaces/ezioruan/roop