tencent-ailab / PCDMs

Implementation code:Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
Apache License 2.0
144 stars 8 forks source link

Stage3 生成的图片有噪声 #23

Open RuijieH opened 1 month ago

RuijieH commented 1 month ago

作者您好,当我使用您提供的第二阶段图片和第三阶段权重生成第三阶段的图片时,生成的图片有噪声,想问下您是否清楚是什么原因导致的。 启动命令如下

CUDA_VISIBLE_DEVICES=4,5 python3 stage3_batchtest_refined_model.py \
  --img_weigh 512 \
  --img_height 512 \
 --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" \
 --image_encoder_p_path="facebook/dinov2-giant" \
 --img_path="./Deepfashion/test_lst_512_png/" \
 --json_path="./Deepfashion/test_data0_0.json" \
 --pose_path="./openpose_imgs/images/" \
 --gen_t_img_path="./PCDMs_Results/stage2_512_512_results/" \
 --save_path="./logs/view_stage3/512_512" \
 --weights_name="./pretrained_pcdms/Checkpoints/stage3_checkpoints/512/s3_512.pt" \
 --calculate_metrics

生成的图片效果如下 WOMEN_Tees_Tanks_id_00004153_03_3_back_to_WOMEN_Tees_Tanks_id_00004153_03_4_full WOMEN_Tees_Tanks_id_00000048_01_2_side_to_WOMEN_Tees_Tanks_id_00000048_01_7_additional WOMEN_Rompers_Jumpsuits_id_00005463_01_2_side_to_WOMEN_Rompers_Jumpsuits_id_00005463_01_1_front WOMEN_Shorts_id_00005138_03_1_front_to_WOMEN_Shorts_id_00005138_03_4_full

muzishen commented 1 month ago

Maybe cfg value is too big.

RuijieH commented 1 month ago

请问您是否还有关于这个超参数的记录呢?我目前使用的是默认值2.0。由于我是使用您提供的二阶段图片和三阶段权重,因此为了能够复现您论文中的效果,可能需要使用与您实验时相同的超参数。非常希望得到您的帮助!

1llss commented 1 month ago

请问您在执行第三阶段测试代码的时候有没有出现以下问题能否为我解答一下? vae_gen_t_image shape: torch.Size([1, 3, 512, 512]) Original gen_t_img_f shape: torch.Size([1, 4, 64, 64]) 0%| | 0/20 [00:00<?, ?it/s]vae_gen_t_image shape: torch.Size([1, 3, 512, 512]) 0%| | 0/20 [00:00<?, ?it/s] Process Process-1: Traceback (most recent call last): File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, self._kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\stage3_batchtest_refined_model.py", line 161, in inference output = pipe( File "D:\ana\envs\PCDMs\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\src\pipelines\stage3_refined_pipeline.py", line 547, in call noise_mask_maskedimage_latents = torch.cat([latent_model_input, gen_t_img_f], dim=1).to(dtype=torch.float32) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 2 for tensor number 1 in the list. Original gen_t_img_f shape: torch.Size([1, 4, 64, 64]) 0%| | 0/20 [00:00<?, ?it/s] Process Process-2: Traceback (most recent call last): File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 108, in run self._target(self._args, self._kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\stage3_batchtest_refined_model.py", line 161, in inference output = pipe( File "D:\ana\envs\PCDMs\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\src\pipelines\stage3_refined_pipeline.py", line 547, in call noise_mask_maskedimage_latents = torch.cat([latent_model_input, gen_t_img_f], dim=1).to(dtype=torch.float32) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 2 for tensor number 1 in the list.

RuijieH commented 1 month ago

请问您在执行第三阶段测试代码的时候有没有出现以下问题能否为我解答一下? vae_gen_t_image shape: torch.Size([1, 3, 512, 512]) Original gen_t_img_f shape: torch.Size([1, 4, 64, 64]) 0%| | 0/20 [00:00<?, ?it/s]vae_gen_t_image shape: torch.Size([1, 3, 512, 512]) 0%| | 0/20 [00:00<?, ?it/s] Process Process-1: Traceback (most recent call last): File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, self._kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\stage3_batchtest_refined_model.py", line 161, in inference output = pipe( File "D:\ana\envs\PCDMs\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\src\pipelines\stage3_refined_pipeline.py", line 547, in call* noise_mask_maskedimage_latents = torch.cat([latent_model_input, gen_t_img_f], dim=1).to(dtype=torch.float32) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 2 for tensor number 1 in the list. Original gen_t_img_f shape: torch.Size([1, 4, 64, 64]) 0%| | 0/20 [00:00<?, ?it/s] Process Process-2: Traceback (most recent call last): File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "D:\ana\envs\PCDMs\lib\multiprocessing\process.py", line 108, in run self._target(self._args, self._kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\stage3_batchtest_refined_model.py", line 161, in inference output = pipe( File "D:\ana\envs\PCDMs\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "D:\edge下载文件\姿态识别\PCDMs-main\src\pipelines\stage3_refined_pipeline.py", line 547, in call** noise_mask_maskedimage_latents = torch.cat([latent_model_input, gen_t_img_f], dim=1).to(dtype=torch.float32) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 2 for tensor number 1 in the list.

初始的代码存在num_images_per_prompt这个维度不匹配问题,我在stage3_refined_pipeline.py 的489,和490添加了以下两行,您可以试试看。不过我不太确定这样是否修改了原先的逻辑,导致我上面提到的问题。

        feature_f = feature_f.repeat(bs * num_images_per_prompt, 1, 1).to(device, dtype=torch.float16)
        gen_t_img_f = gen_t_img_f.repeat(bs * num_images_per_prompt, 1, 1, 1).to(dtype=torch.float16, device=device)
muzishen commented 1 month ago

num_images_per_prompt seems to require that it be set to 1.

RuijieH commented 1 month ago

num_images_per_prompt seems to require that it be set to 1.

Yes, you are right. It runs normally when num_images_per_prompt is set to 1. Additionally, I found that the noise I mentioned above disappears when cfg is set to 1 (the default is 2). I would like to know if cfg was set to 1 when you obtained your metrics in your paper.

muzishen commented 1 month ago

Congratulations on you !!! I am very sorry that it has been a little long, and I have left my job. Maybe you are right.

RuijieH commented 1 month ago

Thank u for your outstanding job. Best wishes!

2024年7月30日 00:24,Fei Shen @.***> 写道:

Congratulations on you !!! I am very sorry that it has been a little long, and I have left my job. Maybe you are right.

— Reply to this email directly, view it on GitHub https://github.com/tencent-ailab/PCDMs/issues/23#issuecomment-2256376479, or unsubscribe https://github.com/notifications/unsubscribe-auth/BE4PY5MOD7ATM27X2QE5CMTZOZUFRAVCNFSM6AAAAABK76AO4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJWGM3TMNBXHE. You are receiving this because you authored the thread.