Open dm33tri opened 8 months ago
I'll take a look.
Was able to do something like this, but it's just a quick fix:
prompt_image_emb = prompt_image_emb.to(
device=self.image_proj_model.latents.device,
dtype=self.image_proj_model.latents.dtype
)
prompt_image_emb = self.image_proj_model(prompt_image_emb)
return prompt_image_emb.to(device=device, dtype=dtype)
Also force fp32
on image_proj_model
in case of CPU
May also force image_proj_model
to cuda but I think it will be slower on my 10GB card
Thx
I cannot reproduce this problem.
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
base_model_path,
controlnet=controlnet,
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
pipe.cuda()
pipe.load_ip_adapter_instantid(face_adapter)
pipe.enable_model_cpu_offload()
can lower VRAM.
@dm33tri i am also getting error can you help?
here my code
G:\instant id auto installer\venv\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Start inference...
[Debug] Prompt: watercolor painting, a man. vibrant, beautiful, painterly, detailed, textural, artistic,
[Debug] Neg Prompt: (lowres, low quality, worst quality:1.2), (text:1.2), watermark, anime, photorealistic, 35mm film, deformed, glitch, low contrast, noisy (lowres, low quality, worst quality:1.2), (text:1.2), watermark, (frame:1.2), deformed, ugly, deformed eyes, blur, out of focus, blurry, deformed cat, deformed, photo, anthropomorphic cat, monochrome, pet collar, gun, weapon, blue, 3d, drones, drone, buildings in background, green
Traceback (most recent call last):
File "G:\instant id auto installer\venv\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "G:\instant id auto installer\venv\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "G:\instant id auto installer\venv\lib\site-packages\gradio\blocks.py", line 1561, in process_api
result = await self.call_function(
File "G:\instant id auto installer\venv\lib\site-packages\gradio\blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "G:\instant id auto installer\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "G:\instant id auto installer\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "G:\instant id auto installer\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
result = context.run(func, *args)
File "G:\instant id auto installer\venv\lib\site-packages\gradio\utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "G:\instant id auto installer\venv\lib\site-packages\gradio\utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "G:\instant id auto installer\web-ui.py", line 216, in generate_image
images = pipe(
File "G:\instant id auto installer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\instant id auto installer\pipeline_stable_diffusion_xl_instantid.py", line 522, in __call__
prompt_image_emb = self._encode_prompt_image_emb(image_embeds,
File "G:\instant id auto installer\pipeline_stable_diffusion_xl_instantid.py", line 235, in _encode_prompt_image_emb
prompt_image_emb = self.image_proj_model(prompt_image_emb)
File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "G:\instant id auto installer\venv\lib\site-packages\ip_adapter\resampler.py", line 135, in forward
x = self.proj_in(x)
File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "G:\instant id auto installer\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
MAX_SEED = np.iinfo(np.int32).max
device = get_torch_device()
dtype = torch.float16 if str(device).__contains__("cuda") else torch.float32
STYLE_NAMES = list(styles.keys())
DEFAULT_STYLE_NAME = "Watercolor"
# Load face encoder
app = FaceAnalysis(name='antelopev2', root='checkpoints', providers=['CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
# Path to InstantID models
face_adapter = f'checkpoints/ip-adapter.bin'
controlnet_path = f'checkpoints/ControlNetModel'
# Load pipeline
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=dtype)
def main(pretrained_model_name_or_path="wangqixun/YamerMIX_v8"):
if pretrained_model_name_or_path.endswith(
".ckpt"
) or pretrained_model_name_or_path.endswith(".safetensors"):
scheduler_kwargs = hf_hub_download(
repo_id="wangqixun/YamerMIX_v8",
subfolder="scheduler",
filename="scheduler_config.json",
)
(tokenizers, text_encoders, unet, _, vae) = load_models_xl(
pretrained_model_name_or_path=pretrained_model_name_or_path,
scheduler_name=None,
weight_dtype=dtype,
)
scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_kwargs)
pipe = StableDiffusionXLInstantIDPipeline(
vae=vae,
text_encoder=text_encoders[0],
text_encoder_2=text_encoders[1],
tokenizer=tokenizers[0],
tokenizer_2=tokenizers[1],
unet=unet,
scheduler=scheduler,
controlnet=controlnet,
)
else:
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
pretrained_model_name_or_path,
controlnet=controlnet,
torch_dtype=dtype,
safety_checker=None,
feature_extractor=None,
)
pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.load_ip_adapter_instantid(face_adapter)
def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
if randomize_seed:
seed = random.randint(0, MAX_SEED)
return seed
def swap_to_gallery(images):
return gr.update(value=images, visible=True), gr.update(visible=True), gr.update(visible=False)
def upload_example_to_gallery(images, prompt, style, negative_prompt):
return gr.update(value=images, visible=True), gr.update(visible=True), gr.update(visible=False)
def remove_back_to_files():
return gr.update(visible=False), gr.update(visible=False), gr.update(visible=True)
def remove_tips():
return gr.update(visible=False)
def convert_from_cv2_to_image(img: np.ndarray) -> Image:
return Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
def convert_from_image_to_cv2(img: Image) -> np.ndarray:
return cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
def draw_kps(image_pil, kps, color_list=[(255,0,0), (0,255,0), (0,0,255), (255,255,0), (255,0,255)]):
stickwidth = 4
limbSeq = np.array([[0, 2], [1, 2], [3, 2], [4, 2]])
kps = np.array(kps)
w, h = image_pil.size
out_img = np.zeros([h, w, 3])
for i in range(len(limbSeq)):
index = limbSeq[i]
color = color_list[index[0]]
x = kps[index][:, 0]
y = kps[index][:, 1]
length = ((x[0] - x[1]) ** 2 + (y[0] - y[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(y[0] - y[1], x[0] - x[1]))
polygon = cv2.ellipse2Poly((int(np.mean(x)), int(np.mean(y))), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
out_img = cv2.fillConvexPoly(out_img.copy(), polygon, color)
out_img = (out_img * 0.6).astype(np.uint8)
for idx_kp, kp in enumerate(kps):
color = color_list[idx_kp]
x, y = kp
out_img = cv2.circle(out_img.copy(), (int(x), int(y)), 10, color, -1)
out_img_pil = Image.fromarray(out_img.astype(np.uint8))
return out_img_pil
def resize_img(input_image, max_side=1280, min_side=1024, size=None,
pad_to_max_side=False, mode=PIL.Image.BILINEAR, base_pixel_number=64):
w, h = input_image.size
if size is not None:
w_resize_new, h_resize_new = size
else:
ratio = min_side / min(h, w)
w, h = round(ratio*w), round(ratio*h)
ratio = max_side / max(h, w)
input_image = input_image.resize([round(ratio*w), round(ratio*h)], mode)
w_resize_new = (round(ratio * w) // base_pixel_number) * base_pixel_number
h_resize_new = (round(ratio * h) // base_pixel_number) * base_pixel_number
input_image = input_image.resize([w_resize_new, h_resize_new], mode)
if pad_to_max_side:
res = np.ones([max_side, max_side, 3], dtype=np.uint8) * 255
offset_x = (max_side - w_resize_new) // 2
offset_y = (max_side - h_resize_new) // 2
res[offset_y:offset_y+h_resize_new, offset_x:offset_x+w_resize_new] = np.array(input_image)
input_image = Image.fromarray(res)
return input_image
def apply_style(style_name: str, positive: str, negative: str = "") -> tuple[str, str]:
p, n = styles.get(style_name, styles[DEFAULT_STYLE_NAME])
return p.replace("{prompt}", positive), n + ' ' + negative
def generate_image(face_image, pose_image, prompt, negative_prompt, style_name, num_steps, identitynet_strength_ratio, adapter_strength_ratio, guidance_scale, seed, progress=gr.Progress(track_tqdm=True)):
if face_image is None:
raise gr.Error(f"Cannot find any input face image! Please upload the face image")
if prompt is None:
prompt = "a person"
# apply the style template
prompt, negative_prompt = apply_style(style_name, prompt, negative_prompt)
face_image = load_image(face_image[0])
face_image = resize_img(face_image)
face_image_cv2 = convert_from_image_to_cv2(face_image)
height, width, _ = face_image_cv2.shape
# Extract face features
face_info = app.get(face_image_cv2)
if len(face_info) == 0:
raise gr.Error(f"Cannot find any face in the image! Please upload another person image")
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1] # only use the maximum face
face_emb = face_info['embedding']
face_kps = draw_kps(convert_from_cv2_to_image(face_image_cv2), face_info['kps'])
if pose_image is not None:
pose_image = load_image(pose_image[0])
pose_image = resize_img(pose_image)
pose_image_cv2 = convert_from_image_to_cv2(pose_image)
face_info = app.get(pose_image_cv2)
if len(face_info) == 0:
raise gr.Error(f"Cannot find any face in the reference image! Please upload another person image")
face_info = face_info[-1]
face_kps = draw_kps(pose_image, face_info['kps'])
width, height = face_kps.size
generator = torch.Generator(device=device).manual_seed(seed)
print("Start inference...")
print(f"[Debug] Prompt: {prompt}, \n[Debug] Neg Prompt: {negative_prompt}")
pipe.set_ip_adapter_scale(adapter_strength_ratio)
images = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image_embeds=face_emb,
image=face_kps,
controlnet_conditioning_scale=float(identitynet_strength_ratio),
num_inference_steps=num_steps,
guidance_scale=guidance_scale,
height=height,
width=width,
generator=generator
).images
return images, gr.update(visible=True)
prompt_image_emb = self.image_proj_model(prompt_image_emb)
can you send me your modified file please because cant make it work
pipe.enable_model_cpu_offload()
because your code is still moving it into GPU @ResearcherXman
It seems like you have activated model offloading by calling
enable_model_cpu_offload, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, text_encoder, text_encoder_2, tokenizer, tokenizer_2, unet, controlnet, scheduler, feature_extractor, image_encoder to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU:
pipeline.to('cpu')or removing the move altogether if you use offloading.
prompt_image_emb = prompt_image_emb.to( device=self.image_proj_model.latents.device, dtype=self.image_proj_model.latents.dtype ) prompt_image_emb = self.image_proj_model(prompt_image_emb) return prompt_image_emb.to(device=device, dtype=dtype)
you modified entire _encode_prompt_image_emb function right? can you share it here? that is the culprit at me too
Fixed. Now you can use pipe.enable_model_cpu_offload()
, for other optimization tricks, please let us know.
multi controlnet having improved gradio with 1 click installer still working on it but this is requiring 24 GB
Could you share your minimal script to reproduce this error?
Could you share your minimal script to reproduce this error?
it is basically your multi controlnet web ui i just change the model loading logic
def get_model_names():
models_dir = 'models'
if not os.path.exists(models_dir):
os.makedirs(models_dir)
model_files = [f for f in os.listdir(models_dir) if f.endswith('.safetensors')]
return model_files
def assign_last_params():
global pipe
pipe.enable_model_cpu_offload()
pipe.to(device)
pipe.load_ip_adapter_instantid(face_adapter)
pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)
# load and disable LCM
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.disable_lora()
print("Model loaded successfully.")
if torch.cuda.is_available():
torch.cuda.empty_cache()
def main(pretrained_model_name_or_path="wangqixun/YamerMIX_v8", enable_lcm_arg=False):
global pipe # Declare pipe as a global variable to manage it when the model changes
last_loaded_model_path = pretrained_model_name_or_path # Track the last loaded model path
def load_model(pretrained_model_name_or_path):
if pretrained_model_name_or_path.endswith(
".ckpt"
) or pretrained_model_name_or_path.endswith(".safetensors"):
scheduler_kwargs = hf_hub_download(
repo_id="wangqixun/YamerMIX_v8",
subfolder="scheduler",
filename="scheduler_config.json",
)
(tokenizers, text_encoders, unet, _, vae) = load_models_xl(
pretrained_model_name_or_path=pretrained_model_name_or_path,
scheduler_name=None,
weight_dtype=dtype,
)
scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_kwargs)
pipe = StableDiffusionXLInstantIDPipeline(
vae=vae,
text_encoder=text_encoders[0],
text_encoder_2=text_encoders[1],
tokenizer=tokenizers[0],
tokenizer_2=tokenizers[1],
unet=unet,
scheduler=scheduler,
controlnet=[controlnet_identitynet],
)
else:
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
pretrained_model_name_or_path,
controlnet=[controlnet_identitynet],
torch_dtype=dtype,
safety_checker=None,
feature_extractor=None,
)
pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(
pipe.scheduler.config
)
return pipe
print(f"Loading model: {pretrained_model_name_or_path}")
pipe = load_model(pretrained_model_name_or_path)
assign_last_params()
def reload_pipe_if_needed(model_input, model_dropdown):
nonlocal last_loaded_model_path
# Trim the model_input to remove any leading or trailing whitespace
model_input = model_input.strip() if model_input else None
# Determine the model to load
model_to_load = model_input if model_input else os.path.join('models', model_dropdown) if model_dropdown else None
# Return early if no model is selected or inputted
if not model_to_load:
print("No model selected or inputted. Please select or input a model.")
return
# Proceed with reloading the model if it's different from the last loaded model
if model_to_load != last_loaded_model_path:
print(f"Reloading model: {model_to_load}")
global pipe
# Properly discard the old pipe if it exists
if hasattr(pipe, 'scheduler'):
del pipe.scheduler
# Load the new model
pipe = load_model(model_to_load)
last_loaded_model_path = model_to_load
assign_last_params()
my team fixed the enable_sequential_cpu_offload for single controlnet app py
now working amazing
we also fixed the image cropping function and now images are properly cropped so you don't get distorted faces
all shared on patreon with 1 click installer - downloads models automatically as well
we are still working on multi controlnet enable_sequential_cpu_offload
it also has a lot of features
Fixed. Now you can use
pipe.enable_model_cpu_offload()
, for other optimization tricks, please let us know.
Is this a code change that I should make to one of the files? I want to make this run well, but my 12GB card really struggles on it.
Fixed. Now you can use
pipe.enable_model_cpu_offload()
, for other optimization tricks, please let us know.Is this a code change that I should make to one of the files? I want to make this run well, but my 12GB card really struggles on it.
yes code changes now working on 12 GB
sadly developers of InstantID ignoring. we had to spent huge time to fix
This issue still occurs with the current repository. Is there an official fix, or do we have to join someones paywalled patreon to make it work on cards with less than 24 GB?
When run with
pipe.enable_model_cpu_offload()
, it shows an error:CPU offloading drastically speeds up generation with low VRAM.
Can it be implemented?