Support offloading with accelerate

dm33tri commented 8 months ago

When run with pipe.enable_model_cpu_offload(), it shows an error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

CPU offloading drastically speeds up generation with low VRAM.

Can it be implemented?

haofanwang commented 8 months ago

I'll take a look.

dm33tri commented 8 months ago

Was able to do something like this, but it's just a quick fix:

prompt_image_emb = prompt_image_emb.to(
    device=self.image_proj_model.latents.device,
    dtype=self.image_proj_model.latents.dtype
)
prompt_image_emb = self.image_proj_model(prompt_image_emb)
return prompt_image_emb.to(device=device, dtype=dtype)

Also force fp32 on image_proj_model in case of CPU

May also force image_proj_model to cuda but I think it will be slower on my 10GB card

haofanwang commented 8 months ago

Thx

ResearcherXman commented 8 months ago

I cannot reproduce this problem.

pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
    base_model_path,
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
pipe.cuda()
pipe.load_ip_adapter_instantid(face_adapter)

pipe.enable_model_cpu_offload() can lower VRAM.

FurkanGozukara commented 7 months ago

@dm33tri i am also getting error can you help?

here my code

G:\instant id auto installer\venv\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Start inference...
[Debug] Prompt: watercolor painting, a man. vibrant, beautiful, painterly, detailed, textural, artistic,
[Debug] Neg Prompt: (lowres, low quality, worst quality:1.2), (text:1.2), watermark, anime, photorealistic, 35mm film, deformed, glitch, low contrast, noisy (lowres, low quality, worst quality:1.2), (text:1.2), watermark, (frame:1.2), deformed, ugly, deformed eyes, blur, out of focus, blurry, deformed cat, deformed, photo, anthropomorphic cat, monochrome, pet collar, gun, weapon, blue, 3d, drones, drone, buildings in background, green
Traceback (most recent call last):
  File "G:\instant id auto installer\venv\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "G:\instant id auto installer\venv\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "G:\instant id auto installer\venv\lib\site-packages\gradio\blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "G:\instant id auto installer\venv\lib\site-packages\gradio\blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "G:\instant id auto installer\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "G:\instant id auto installer\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "G:\instant id auto installer\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "G:\instant id auto installer\venv\lib\site-packages\gradio\utils.py", line 695, in wrapper
    response = f(*args, **kwargs)
  File "G:\instant id auto installer\venv\lib\site-packages\gradio\utils.py", line 695, in wrapper
    response = f(*args, **kwargs)
  File "G:\instant id auto installer\web-ui.py", line 216, in generate_image
    images = pipe(
  File "G:\instant id auto installer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "G:\instant id auto installer\pipeline_stable_diffusion_xl_instantid.py", line 522, in __call__
    prompt_image_emb = self._encode_prompt_image_emb(image_embeds,
  File "G:\instant id auto installer\pipeline_stable_diffusion_xl_instantid.py", line 235, in _encode_prompt_image_emb
    prompt_image_emb = self.image_proj_model(prompt_image_emb)
  File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\instant id auto installer\venv\lib\site-packages\ip_adapter\resampler.py", line 135, in forward
    x = self.proj_in(x)
  File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

MAX_SEED = np.iinfo(np.int32).max
device = get_torch_device()
dtype = torch.float16 if str(device).__contains__("cuda") else torch.float32
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "Watercolor"

# Load face encoder
app = FaceAnalysis(name='antelopev2', root='checkpoints', providers=['CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

# Path to InstantID models
face_adapter = f'checkpoints/ip-adapter.bin'
controlnet_path = f'checkpoints/ControlNetModel'

# Load pipeline
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=dtype)

def main(pretrained_model_name_or_path="wangqixun/YamerMIX_v8"):

    if pretrained_model_name_or_path.endswith(
            ".ckpt"
        ) or pretrained_model_name_or_path.endswith(".safetensors"):
            scheduler_kwargs = hf_hub_download(
                repo_id="wangqixun/YamerMIX_v8",
                subfolder="scheduler",
                filename="scheduler_config.json",
            )

            (tokenizers, text_encoders, unet, _, vae) = load_models_xl(
                pretrained_model_name_or_path=pretrained_model_name_or_path,
                scheduler_name=None,
                weight_dtype=dtype,
            )

            scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_kwargs)
            pipe = StableDiffusionXLInstantIDPipeline(
                vae=vae,
                text_encoder=text_encoders[0],
                text_encoder_2=text_encoders[1],
                tokenizer=tokenizers[0],
                tokenizer_2=tokenizers[1],
                unet=unet,
                scheduler=scheduler,
                controlnet=controlnet,
            )

    else:
        pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
            pretrained_model_name_or_path,
            controlnet=controlnet,
            torch_dtype=dtype,
            safety_checker=None,
            feature_extractor=None,
        )

        pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)

    pipe.enable_model_cpu_offload()
    pipe.load_ip_adapter_instantid(face_adapter)

    def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
        if randomize_seed:
            seed = random.randint(0, MAX_SEED)
        return seed

    def swap_to_gallery(images):
        return gr.update(value=images, visible=True), gr.update(visible=True), gr.update(visible=False)

    def upload_example_to_gallery(images, prompt, style, negative_prompt):
        return gr.update(value=images, visible=True), gr.update(visible=True), gr.update(visible=False)

    def remove_back_to_files():
        return gr.update(visible=False), gr.update(visible=False), gr.update(visible=True)

    def remove_tips():
        return gr.update(visible=False)

    def convert_from_cv2_to_image(img: np.ndarray) -> Image:
        return Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

    def convert_from_image_to_cv2(img: Image) -> np.ndarray:
        return cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

    def draw_kps(image_pil, kps, color_list=[(255,0,0), (0,255,0), (0,0,255), (255,255,0), (255,0,255)]):
        stickwidth = 4
        limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]])
        kps = np.array(kps)

        w, h = image_pil.size
        out_img = np.zeros([h, w, 3])

        for i in range(len(limbSeq)):
            index = limbSeq[i]
            color = color_list[index[0]]

            x = kps[index][:, 0]
            y = kps[index][:, 1]
            length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5
            angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1]))
            polygon = cv2.ellipse2Poly((int(np.mean(x)), int(np.mean(y))), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
            out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color)
        out_img = (out_img * 0.6).astype(np.uint8)

        for idx_kp, kp in enumerate(kps):
            color = color_list[idx_kp]
            x, y = kp
            out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1)

        out_img_pil = Image.fromarray(out_img.astype(np.uint8))
        return out_img_pil

    def resize_img(input_image, max_side=1280, min_side=1024, size=None, 
                pad_to_max_side=False, mode=PIL.Image.BILINEAR, base_pixel_number=64):

            w, h = input_image.size
            if size is not None:
                w_resize_new, h_resize_new = size
            else:
                ratio = min_side / min(h, w)
                w, h = round(ratio*w), round(ratio*h)
                ratio = max_side / max(h, w)
                input_image = input_image.resize([round(ratio*w), round(ratio*h)], mode)
                w_resize_new = (round(ratio * w) // base_pixel_number) * base_pixel_number
                h_resize_new = (round(ratio * h) // base_pixel_number) * base_pixel_number
            input_image = input_image.resize([w_resize_new, h_resize_new], mode)

            if pad_to_max_side:
                res = np.ones([max_side, max_side, 3], dtype=np.uint8) * 255
                offset_x = (max_side - w_resize_new) // 2
                offset_y = (max_side - h_resize_new) // 2
                res[offset_y:offset_y+h_resize_new, offset_x:offset_x+w_resize_new] = np.array(input_image)
                input_image = Image.fromarray(res)
            return input_image

    def apply_style(style_name: str, positive: str, negative: str = "") -> tuple[str, str]:
        p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
        return p.replace("{prompt}", positive), n + ' ' + negative

    def generate_image(face_image, pose_image, prompt, negative_prompt, style_name, num_steps, identitynet_strength_ratio, adapter_strength_ratio, guidance_scale, seed, progress=gr.Progress(track_tqdm=True)):

        if face_image is None:
            raise gr.Error(f"Cannot find any input face image! Please upload the face image")

        if prompt is None:
            prompt = "a person"

        # apply the style template
        prompt, negative_prompt = apply_style(style_name, prompt, negative_prompt)

        face_image = load_image(face_image[0])
        face_image = resize_img(face_image)
        face_image_cv2 = convert_from_image_to_cv2(face_image)
        height, width, _ = face_image_cv2.shape

        # Extract face features
        face_info = app.get(face_image_cv2)

        if len(face_info) == 0:
            raise gr.Error(f"Cannot find any face in the image! Please upload another person image")

        face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1]  # only use the maximum face
        face_emb = face_info['embedding']
        face_kps = draw_kps(convert_from_cv2_to_image(face_image_cv2), face_info['kps'])

        if pose_image is not None:
            pose_image = load_image(pose_image[0])
            pose_image = resize_img(pose_image)
            pose_image_cv2 = convert_from_image_to_cv2(pose_image)

            face_info = app.get(pose_image_cv2)

            if len(face_info) == 0:
                raise gr.Error(f"Cannot find any face in the reference image! Please upload another person image")

            face_info = face_info[-1]
            face_kps = draw_kps(pose_image, face_info['kps'])

            width, height = face_kps.size

        generator = torch.Generator(device=device).manual_seed(seed)

        print("Start inference...")
        print(f"[Debug] Prompt: {prompt}, \n[Debug] Neg Prompt: {negative_prompt}")

        pipe.set_ip_adapter_scale(adapter_strength_ratio)
        images = pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            image_embeds=face_emb,
            image=face_kps,
            controlnet_conditioning_scale=float(identitynet_strength_ratio),
            num_inference_steps=num_steps,
            guidance_scale=guidance_scale,
            height=height,
            width=width,
            generator=generator
        ).images

        return images, gr.update(visible=True)

FurkanGozukara commented 7 months ago

prompt_image_emb = self.image_proj_model(prompt_image_emb)

can you send me your modified file please because cant make it work

FurkanGozukara commented 7 months ago

pipe.enable_model_cpu_offload()

because your code is still moving it into GPU @ResearcherXman

It seems like you have activated model offloading by callingenable_model_cpu_offload, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, text_encoder, text_encoder_2, tokenizer, tokenizer_2, unet, controlnet, scheduler, feature_extractor, image_encoder to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU:pipeline.to('cpu')or removing the move altogether if you use offloading.

FurkanGozukara commented 7 months ago

prompt_image_emb = prompt_image_emb.to( device=self.image_proj_model.latents.device, dtype=self.image_proj_model.latents.dtype ) prompt_image_emb = self.image_proj_model(prompt_image_emb) return prompt_image_emb.to(device=device, dtype=dtype)

you modified entire _encode_prompt_image_emb function right? can you share it here? that is the culprit at me too

haofanwang commented 7 months ago

Fixed. Now you can use pipe.enable_model_cpu_offload(), for other optimization tricks, please let us know.

FurkanGozukara commented 7 months ago

multi controlnet having improved gradio with 1 click installer still working on it but this is requiring 24 GB

screencapture-127-0-0-1-7860-2024-02-02-05_00_57

ResearcherXman commented 7 months ago

Could you share your minimal script to reproduce this error?

FurkanGozukara commented 7 months ago

Could you share your minimal script to reproduce this error?

it is basically your multi controlnet web ui i just change the model loading logic

def get_model_names():
    models_dir = 'models'
    if not os.path.exists(models_dir):
        os.makedirs(models_dir)
    model_files = [f for f in os.listdir(models_dir) if f.endswith('.safetensors')]
    return model_files

def assign_last_params():
    global pipe
    pipe.enable_model_cpu_offload()
    pipe.to(device)

    pipe.load_ip_adapter_instantid(face_adapter)

    pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)
            # load and disable LCM
    pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
    pipe.disable_lora()

    print("Model loaded successfully.")
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

def main(pretrained_model_name_or_path="wangqixun/YamerMIX_v8", enable_lcm_arg=False):

    global pipe  # Declare pipe as a global variable to manage it when the model changes

    last_loaded_model_path = pretrained_model_name_or_path  # Track the last loaded model path

    def load_model(pretrained_model_name_or_path):
        if pretrained_model_name_or_path.endswith(
            ".ckpt"
        ) or pretrained_model_name_or_path.endswith(".safetensors"):
            scheduler_kwargs = hf_hub_download(
                repo_id="wangqixun/YamerMIX_v8",
                subfolder="scheduler",
                filename="scheduler_config.json",
            )

            (tokenizers, text_encoders, unet, _, vae) = load_models_xl(
                pretrained_model_name_or_path=pretrained_model_name_or_path,
                scheduler_name=None,
                weight_dtype=dtype,
            )

            scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_kwargs)
            pipe = StableDiffusionXLInstantIDPipeline(
                vae=vae,
                text_encoder=text_encoders[0],
                text_encoder_2=text_encoders[1],
                tokenizer=tokenizers[0],
                tokenizer_2=tokenizers[1],
                unet=unet,
                scheduler=scheduler,
                controlnet=[controlnet_identitynet],
            )

        else:
            pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
                pretrained_model_name_or_path,
                controlnet=[controlnet_identitynet],
                torch_dtype=dtype,
                safety_checker=None,
                feature_extractor=None,
            )

            pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(
                pipe.scheduler.config
            )
        return pipe

    print(f"Loading model: {pretrained_model_name_or_path}")
    pipe = load_model(pretrained_model_name_or_path)

    assign_last_params()

    def reload_pipe_if_needed(model_input, model_dropdown):
        nonlocal last_loaded_model_path

        # Trim the model_input to remove any leading or trailing whitespace
        model_input = model_input.strip() if model_input else None

        # Determine the model to load
        model_to_load = model_input if model_input else os.path.join('models', model_dropdown) if model_dropdown else None

        # Return early if no model is selected or inputted
        if not model_to_load:
            print("No model selected or inputted. Please select or input a model.")
            return

        # Proceed with reloading the model if it's different from the last loaded model
        if model_to_load != last_loaded_model_path:
            print(f"Reloading model: {model_to_load}")
            global pipe
            # Properly discard the old pipe if it exists
            if hasattr(pipe, 'scheduler'):
                del pipe.scheduler

            # Load the new model
            pipe = load_model(model_to_load)
            last_loaded_model_path = model_to_load
            assign_last_params()

FurkanGozukara commented 7 months ago

my team fixed the enable_sequential_cpu_offload for single controlnet app py

now working amazing

we also fixed the image cropping function and now images are properly cropped so you don't get distorted faces

all shared on patreon with 1 click installer - downloads models automatically as well

we are still working on multi controlnet enable_sequential_cpu_offload

it also has a lot of features

camoody1 commented 7 months ago

Fixed. Now you can use pipe.enable_model_cpu_offload(), for other optimization tricks, please let us know.

Is this a code change that I should make to one of the files? I want to make this run well, but my 12GB card really struggles on it.

FurkanGozukara commented 7 months ago

Fixed. Now you can use pipe.enable_model_cpu_offload(), for other optimization tricks, please let us know.

Is this a code change that I should make to one of the files? I want to make this run well, but my 12GB card really struggles on it.

yes code changes now working on 12 GB

sadly developers of InstantID ignoring. we had to spent huge time to fix

andypotato commented 3 months ago

This issue still occurs with the current repository. Is there an official fix, or do we have to join someones paywalled patreon to make it work on cards with less than 24 GB?

instantX-research / InstantID

Support offloading with accelerate #34