iPERDance / iPERCore

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis
https://iperdance.github.io/work/impersonator-plus-plus.html
Apache License 2.0
2.42k stars 312 forks source link

请问一下作者:生成的视频不清晰,很模糊是什么原因呢?要怎么才能改善呢? #50

Closed anguoKuang closed 3 years ago

piaozhx commented 3 years ago

老铁, 提问题的时候能不能提供稍微详细一点的情况(运行的命令, 配置, 参数, 源视频等等), 什么都没有我们怎么帮你分析原因...

anguoKuang commented 3 years ago

老铁,提问题的时候能不能提供稍微详细一点的情况(运行的命令,配置,参数,源视频等等),什么都没有我们怎么帮你分析原因...

抱歉抱歉。。 以下是我的运行命令和日志: A:\Anaconda\envs\IPERcore-main\python.exe A:/py_workspace/iPERCore-main2/iPERCore-main/demo/motion_imitate.py --gpu_ids 0 --image_size 256 --num_source 2 --output_dir ../results --assets_dir ../assets --model_id donald_trump_2 --src_path path?=../assets/samples/sources/donald_trump_2/000000.PNG,name?=donald_trump_2 --ref_path path?=../assets/samples/references/akun_2.mp4,name?=akun_2,pose_fc?=300 ../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffmpeg.exe -y -i ../assets/samples/references/akun_2.mp4 -ab 160k -ac 2 -ar 44100 -vn ../results\primitives\akun_2\processed\audio.mp3 -loglevel quiet ../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffprobe.exe -v error -select_streams v -of default=noprint_wrappers=1:nokey=1 -show_entries stream=r_frame_rate ../assets/samples/references/akun_2.mp4 ------------ Options ------------- {'MAX_NUM_SOURCE': 8, 'MultiMedia': {'ffmpeg': {'Linux': {'ffmpeg_exe_path': 'ffmpeg', 'ffprobe_exe_path': 'ffprobe'}, 'Windows': {'ffmpeg_exe_path': '../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffmpeg.exe', 'ffprobe_exe_path': '../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffprobe.exe'}, 'pix_fmt': 'yuv420p', 'vcodec': 'h264'}, 'image': {'caption': 'this is a fake video, synthesized by ' 'impersonator++', 'saved_nameformat': 'pred{:0>8}.png'}}, 'NUMBER_FACES': 13776, 'NUMBER_VERTS': 6890, 'Preprocess': {'BackgroundInpaintor': {'bg_replace': True, 'cfg_path': '../assets/configs/inpaintors/mmedit_inpainting.toml', 'dilate_iter_num': 3, 'dilate_kernel_size': 9, 'name': 'mmedit_inpainting', 'use_sr': True}, 'Cropper': {'ref_crop_factor': 3.0, 'src_crop_factor': 1.3}, 'Deformer': {'cloth_parse_ckpt_path': '../assets/checkpoints/mattors/exp-schp-lip.pth'}, 'FrontInfo': {'NUM_CANDIDATE': 25, 'RENDER_SIZE': 256}, 'HumanMattors': {'cfg_path': '../assets/configs/mattors/point_render+gca.toml', 'dilate_iter_num': 7, 'erode_iter_num': 2, 'morph_kernel_size': 3, 'name': 'point_render+gca'}, 'MAX_PER_GPU_PROCESS': 1, 'Pose2dEstimator': {'cfg_path': '../assets/configs/pose2d/openpose/body25.toml', 'joint_type': 'OpenPose-Body-25', 'name': 'openpose'}, 'Pose3dEstimator': {'batch_size': 32, 'cfg_path': '../assets/configs/pose3d/spin.toml', 'name': 'spin', 'num_workers': 4}, 'Pose3dRefiner': {'cfg_path': '../assets/configs/pose3d/smplify.toml', 'name': 'smplify', 'use_lfbgs': True}, 'Tracker': {'tracker_name': 'max_box'}, 'estimate_boxes_first': True, 'filter_invalid': True, 'has_detector': True, 'temporal': True, 'use_smplify': True}, 'Train': {'D_adam_b1': 0.9, 'D_adam_b2': 0.999, 'G_adam_b1': 0.9, 'G_adam_b2': 0.999, 'aug_bg': False, 'display_freq_s': 30, 'face_factor': 1.0, 'face_loss_path': '../assets/checkpoints/losses/sphere20a_20171020.pth', 'final_lr': 2e-06, 'lambda_D_prob': 1.0, 'lambda_face': 5.0, 'lambda_mask': 5.0, 'lambda_mask_smooth': 1.0, 'lambda_rec': 10.0, 'lambda_tsf': 10.0, 'lr_D': 0.0001, 'lr_G': 0.0001, 'niters_or_epochs_decay': 0, 'niters_or_epochs_no_decay': 100, 'num_iters_validate': 1, 'opti': 'Adam', 'print_freq_s': 30, 'save_latest_freq_s': 300, 'tb_visual': False, 'train_G_every_n_iterations': 1, 'use_face': True, 'use_vgg': 'VGG19', 'vgg_loss_path': '../assets/checkpoints/losses/vgg19-dcbb9e9d.pth'}, 'assets_dir': '../assets', 'batch_size': 1, 'bg_ks': 11, 'cam_strategy': 'smooth', 'cfg_path': '../assets/configs/deploy.toml', 'digital_type': 'cloth_smpl_link', 'dis_name': 'patch_global', 'face_path': '../assets/checkpoints/pose3d/smpl_faces.npy', 'facial_path': '../assets/checkpoints/pose3d/front_facial.json', 'fim_enc_path': '../assets/checkpoints/pose3d/mapper_fim_enc.txt', 'front_path': '../assets/checkpoints/pose3d/front_body.json', 'ft_ks': 1, 'gen_name': 'AttLWB-SPADE', 'gpu_ids': '0', 'head_path': '../assets/checkpoints/pose3d/head.json', 'image_size': 256, 'intervals': 1, 'ip': '', 'load_epoch': -1, 'load_path_D': 'None', 'load_path_G': '../assets/checkpoints/neural_renders/AttLWB-SPADE_id_G_2020-05-18.pth', 'local_rank': 0, 'map_name': 'uv_seg', 'meta_data': {'checkpoints_dir': '../results\models\donald_trump_2', 'meta_ref': [<iPERCore.services.options.meta_info.MetaProcess object at 0x000001D9A8D6B5F8>], 'meta_src': [<iPERCore.services.options.meta_info.MetaProcess object at 0x000001D9A8D6B9B0>], 'opt_path': '../results\models\donald_trump_2\opts.txt', 'personalized_ckpt_path': '../results\models\donald_trump_2\personalized.pth', 'root_primitives_dir': '../results\primitives'}, 'model_id': 'donald_trump_2', 'neural_render_cfg': {'Discriminator': {'bg_cond_nc': 4, 'cond_nc': 6, 'max_nf_mult': 8, 'n_layers': 4, 'name': 'patch_global', 'ndf': 64, 'norm_type': 'instance', 'use_sigmoid': False}, 'Generator': {'BGNet': {'cond_nc': 4, 'n_res_block': 6, 'norm_type': 'instance', 'num_filters': [64, 128, 128, 256]}, 'SIDNet': {'cond_nc': 6, 'n_res_block': 6, 'norm_type': 'None', 'num_filters': [64, 128, 256]}, 'TSFNet': {'cond_nc': 6, 'n_res_block': 6, 'norm_type': 'instance', 'num_filters': [64, 128, 256]}, 'name': 'AttLWB-SPADE'}}, 'neural_render_cfg_path': '../assets/configs/neural_renders/AttLWB-SPADE.toml', 'num_source': 2, 'num_workers': 4, 'only_vis': False, 'output_dir': '../results', 'part_path': '../assets/checkpoints/pose3d/smpl_part_info.json', 'port': 0, 'ref_path': 'path?=../assets/samples/references/akun_2.mp4,name?=akun_2,pose_fc?=300', 'serial_batches': False, 'share_bg': True, 'smpl_model': '../assets/checkpoints/pose3d/smpl_model.pkl', 'smpl_model_hand': '../assets/checkpoints/pose3d/smpl_model_with_hand_v2.pkl', 'src_path': 'path?=../assets/samples/sources/donald_trump_2/000000.PNG,name?=donald_trump_2', 'tb_visual': False, 'temporal': False, 'tex_size': 3, 'time_step': 1, 'train_name': 'LWGTrainer', 'use_cudnn': False, 'use_inpaintor': False, 'uv_map_path': '../assets/checkpoints/pose3d/mapper_uv.txt', 'verbose': True} -------------- End ---------------- Pre-processing: start... ----------------------MetaProcess---------------------- meta_input: path: ../assets/samples/sources/donald_trump_2/000000.PNG bg_path: name: donald_trump_2 primitives_dir: ../results\primitives\donald_trump_2 processed_dir: ../results\primitives\donald_trump_2\processed vid_info_path: ../results\primitives\donald_trump_2\processed\vid_info.pkl

----------------------MetaProcess---------------------- meta_input: path: ../assets/samples/references/akun_2.mp4 bg_path: name: akun_2 audio: ../results\primitives\akun_2\processed\audio.mp3 fps: 30.0 pose_fc: 300.0 cam_fc: 100 primitives_dir: ../results\primitives\akun_2 processed_dir: ../results\primitives\akun_2\processed vid_info_path: ../results\primitives\akun_2\processed\vid_info.pkl

1.1 Preprocessing, running Preprocessor to detect the human boxes of ../results\primitives\donald_trump_2\processed\orig_images...

100%|██████████| 1/1 [01:13<00:00, 73.53s/it] 1.1 Preprocessing, finish detect the human boxes of ../results\primitives\donald_trump_2\processed\orig_images ... 1.2 Preprocessing, cropping all images in ../results\primitives\donald_trump_2\processed\orig_images by estimated boxes ... 1it [00:01, 1.90s/it] 0%| | 0/1 [00:00<?, ?it/s] 1.2 Preprocessing, finish crop the human by boxes, and save them in ../results\primitives\donald_trump_2\processed\images ... 1.3 Preprocessing, running Preprocessor to 3D pose estimation of all images in../results\primitives\donald_trump_2\processed\images ... 100%|██████████| 1/1 [00:09<00:00, 9.46s/it] 1.3 Preprocessing, finish 3D pose estimation successfully .... 1.4 Preprocessing, running Preprocessor to find 25 candidates front images in ../results\primitives\donald_trump_2\processed\images ... 0%| | 0/1 [00:00<?, ?it/s] 1.4 Preprocessing, finish find the front images .... 100%|██████████| 1/1 [00:00<00:00, 1.04it/s] 1.5 Preprocessing, running Preprocessor to run human matting in ../results\primitives\donald_trump_2\processed\parse ... 0%| | 0/1 [00:00<?, ?it/s]A:\Anaconda\envs\IPERcore-main\lib\site-packages\torch\nn\functional.py:3000: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed " A:\Anaconda\envs\IPERcore-main\lib\site-packages\mmedit\models\common\gca_module.py:244: UserWarning: Mixed memory format inputs detected while calling the operator. The operator will output contiguous tensor even if some of the inputs are in channels_last format. (Triggered internally at ..\aten\src\ATen\native\TensorIterator.cpp:918.) out = out + self_mask unknown_ps 1.5 Preprocessing, finish run human matting. 100%|██████████| 1/1 [00:00<00:00, 1.01it/s] 1.6 Preprocessing, running Preprocessor to run background inpainting ... 0%| | 0/1 [00:00<?, ?it/s] 1.6 Preprocessing, finish run background inpainting .... 100%|██████████| 1/1 [00:00<00:00, 1.43it/s] 1.7 Preprocessing, saving visualization to ../results\primitives\donald_trump_2\processed\visual.mp4 ... A:\py_workspace\iPERCore-main2\iPERCore-main\iPERCore\tools\utils\visualizers\smpl_visualizer.py:57: UserWarning: Mixed memory format inputs detected while calling the operator. The operator will output channels_last tensor even if some of the inputs are not in channels_last format. (Triggered internally at ..\aten\src\ATen\native\TensorIterator.cpp:924.) masked_img = imgs (1 - sil) + rd_imgs * sil 100%|██████████| 1/1 [00:00<00:00, 3.87it/s] ../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffmpeg.exe -y -i ../results\primitives\donald_trump_2\processed\visual.mp4.avi -vcodec h264 ../results\primitives\donald_trump_2\processed\visual.mp4 -loglevel quiet 1.7 Preprocessing, saving visualization to ../results\primitives\donald_trump_2\processed\visual.mp4 ... Preprocessor has finished... ../assets/samples/references/akun_2.mp4 Writing frames to file ../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffmpeg.exe -i ../assets/samples/references/akun_2.mp4 -start_number 0 ../results\primitives\akun_2\processed\origimages/frame%08d.png ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 10.2.1 (GCC) 20200726 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libgsm --disable-w32threads --enable-libmfx --enable-ffnvcodec --enable-cuda-llvm --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf libavutil 56. 51.100 / 56. 51.100 libavcodec 58. 91.100 / 58. 91.100 libavformat 58. 45.100 / 58. 45.100 libavdevice 58. 10.100 / 58. 10.100 libavfilter 7. 85.100 / 7. 85.100 libswscale 5. 7.100 / 5. 7.100 libswresample 3. 7.100 / 3. 7.100 libpostproc 55. 7.100 / 55. 7.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '../assets/samples/references/akun_2.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Duration: 00:00:07.34, start: 0.000000, bitrate: 1673 kb/s Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 1543 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 127 kb/s (default) Metadata: handler_name : SoundHandler Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> png (native)) Press [q] to stop, [?] for help Output #0, image2, to '../results\primitives\akun_2\processed\origimages/frame%08d.png': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Stream #0:0(eng): Video: png, rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc (default) Metadata: handler_name : VideoHandler encoder : Lavc58.91.100 png frame= 219 fps= 49 q=-0.0 Lsize=N/A time=00:00:07.30 bitrate=N/A speed=1.64x
video:106657kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown 1.1 Preprocessing, running Preprocessor to detect the human boxes of ../results\primitives\akun_2\processed\orig_images... 100%|█████████▉| 218/219 [00:31<00:00, 6.99it/s] 1.1 Preprocessing, finish detect the human boxes of ../results\primitives\akun_2\processed\orig_images ... 100%|██████████| 219/219 [00:31<00:00, 6.91it/s] 1.2 Preprocessing, cropping all images in ../results\primitives\akun_2\processed\orig_images by estimated boxes ... 219it [00:04, 50.48it/s] 0%| | 0/7 [00:00<?, ?it/s] 1.2 Preprocessing, finish crop the human by boxes, and save them in ../results\primitives\akun_2\processed\images ... 1.3 Preprocessing, running Preprocessor to 3D pose estimation of all images in../results\primitives\akun_2\processed\images ... 100%|██████████| 7/7 [00:12<00:00, 1.80s/it] 1.3 Preprocessing, finish 3D pose estimation successfully .... Preprocessor has finished... Pre-processing: digital deformation start... 0%| | 0/1 [00:01<?, ?it/s] Pre-processing: digital deformation completed... the current number of sources are 1, while the pre-defined number of sources are 2. Pre-processing: successfully... Step 2: running personalization on

train video clips = 1

0%| | 0/100 [00:00<?, ?it/s]Network AttLWB-SPADE was created Network patch_global was created Loading vgg19 from ../assets/checkpoints/losses/vgg19-dcbb9e9d.pth... Loading face model from ../assets/checkpoints/losses/sphere20a_20171020.pth Loading net: ../assets/checkpoints/neural_renders/AttLWB-SPADE_id_G_2020-05-18.pth A:\Anaconda\envs\IPERcore-main\lib\site-packages\torch\nn\functional.py:3384: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. warnings.warn("Default grid_sample and affine_grid behavior has changed " 100%|██████████| 100/100 [00:40<00:00, 2.49it/s] saving the personalized model in ../results\models\donald_trump_2\personalized.pth Step 2: personalization done, saved in ../results\models\donald_trump_2\personalized.pth... Step 3: running imitator. Network AttLWB-SPADE was created Loading net from ../results\models\donald_trump_2\personalized.pth Model Imitator was created A:\Anaconda\envs\IPERcore-main\lib\site-packages\torch\nn\functional.py:3384: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. warnings.warn("Default grid_sample and affine_grid behavior has changed " 100%|██████████| 219/219 [00:09<00:00, 24.12it/s] 219it [00:02, 89.10it/s] ../assets/executables/ffmpeg-4.3.1-win64-static/bin/ffmpeg.exe -y -i ../results\primitives\donald_trump_2\synthesis\imitations\donald_trump_2-akun_2.mp4.avi -i ../results\primitives\akun_2\processed\audio.mp3 -vcodec h264 -shortest -strict -2 ../results\primitives\donald_trump_2\synthesis\imitations\donald_trump_2-akun_2.mp4 -loglevel quiet ----------------------MetaOutput---------------------- donald_trump_2 imitates akun_2 in ../results\primitives\donald_trump_2\synthesis\imitations\donald_trump_2-akun_2.mp4

Step 3: running imitator done.

Process finished with exit code 0

数据源用的就是特朗普的那张图片 以下是结果的截图,很模糊。求大神指导一下问题出现在哪里呀 image

StevenLiuWen commented 3 years ago

@anguoKuang, 256 x 256尺度的结果会稍微模糊一点,一方面是256下,人脸占整张图像的区域比例很小,人脸部分稍模糊;另一方面可能是预训练的base模型是在512 x 512上训练的。如果有条件(显卡显存够的话),切换到512或者更高的尺度。

anguoKuang commented 3 years ago

@anguoKuang, 256 x 256尺度的结果会稍微模糊一点,一方面是256下,人脸占整张图像的区域比例很小,人脸部分稍模糊;另一方面可能是预训练的base模型是在512 x 512上训练的。如果有条件(显卡显存够的话),切换到512或者更高的尺度。

多谢多谢,提升到512后质量果然清晰很多。但是整体看起来很不自然,很多细节处是模糊变形的,不知作者在这方面有没有什么方法可以改善呢?

piaozhx commented 3 years ago

@anguoKuang Source图像相比于SMPL估计的越准, 效果越好. 如果你使用的source图像身材和真人比例差别过大(如卡通人, 或者穿裙子, 爆炸头等等), 效果就会不好. 此外, 如果你说的是trump那个例子, 那个已经是尽可能调优的结果了, 毕竟只有一张照片.

如果你想制作一个效果尽可能好的demo, 则需要提供视角覆盖范围尽可能多的照片(如正身, 背身, 侧身等等)