Open naelsen opened 1 month ago
@naelsen Here
Hello @samsara-ku, Could you give me an example step by step on how to use this repo to make inference on my own data ? I appreciate your help
@naelsen Did you check gradio_demo/app.py
? There is a pipeline for custom data (i.e. not prepared agnostic-mask, and densepose). You should better to check about it and I believe you can make extended codes for many data based on that.
@samsara-ku Yes I checked, but I want to deploy a service that I will run on my own hardware. What should I check for that ?
images = pipe(
prompt_embeds=prompt_embeds.to(device,torch.float16),
negative_prompt_embeds=negative_prompt_embeds.to(device,torch.float16),
pooled_prompt_embeds=pooled_prompt_embeds.to(device,torch.float16),
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds.to(device,torch.float16),
num_inference_steps=denoise_steps,
generator=generator,
strength = 1.0,
pose_img = pose_img.to(device,torch.float16),
text_embeds_cloth=prompt_embeds_c.to(device,torch.float16),
cloth = garm_tensor.to(device,torch.float16),
mask_image=mask,
image=human_img,
height=1024,
width=768,
ip_adapter_image = garm_img.resize((768,1024)),
guidance_scale=2.0,
)[0]
Basically, all you have to do is preparing argument of above pipeline; 1) prompt embeddings (e.g. prompt_embeds, negative_prompt_embeds and so on), 2) pose_img, 3) text_embeds_cloth, 4) cloth, 5) mask_image, 6) image.
Prompt embedding are just convert text to vector using tokenizer, you just check that code. Same with text_embeds_cloth
.
pose_img
are came from args.func(args,human_img_arg)
, which is output of densepose. In the gradio_demo/app.py
, they already prepare for this output so that you just read how to use this code.
garm_img
is just cloth image, we don't need to pre-process.
mask_image
is came from get_mask_location
function, which is usually called agnostic mask
in the VTON paper. All you need to know is just agnostic mask
is came from openpose. You need to run openpose_model
function to your custom dataset.
I don't know what human_img_arg
function is but just run it and pass to the pipe function.
Can someone tell me where could I find the weights of the model ? I'm lost amongst all these information. Basically I just want to make inference with the pre-trained model and I don't know how to.