yerfor / GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
MIT License
2.44k stars 290 forks source link

自定义人物视频输出/推理缺少躯干,请帮助。[English] Custom person video output/inference is missing the torso, please help. #207

Open ernestol0817 opened 9 months ago

ernestol0817 commented 9 months ago

在阅读完所有README文档中列出的步骤后,我的输出视频缺少躯干。有人之前见过这种情况吗?如果有,请帮助我。我是说英语的人,但也尽量在这里加入了翻译。[English] After going through all the steps listed in the README documents, my output video is missing the torso. Has anyone seen this behavior before, if so please help me. I am an English speaker but have tried to put in the translation here as well.

yerfor commented 9 months ago

Hi, please check the loss curves of lm3d_radnerf_torso, I suspect that the training of lm3d_radnerf_torso is failed.

ernestol0817 commented 9 months ago

Hello and thank you so much for responding, your help means a lot! Here is the plot from each Epoch from lm3d_radnerf_torso.yaml

lm3d_radnerf_torso loss_curve

Finally I though I'd share the specs from ffprob on the input video, I used the specs from May.mp4 as a guideline so that my own personal input video would as closely match May.mp4 as possible.

(geneface) root@omnia-geneface-bdd5b7dc-fpp86:/workspace/data/raw/videos# ffprobe -v error -show_format -show_streams Richard_matched.mp4 [STREAM] index=0 codec_name=h264 codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 profile=Main codec_type=video codec_time_base=1/50 codec_tag_string=avc1 codec_tag=0x31637661 width=512 height=512 coded_width=512 coded_height=512 has_b_frames=0 sample_aspect_ratio=1:1 display_aspect_ratio=1:1 pix_fmt=yuv420p level=31 color_range=unknown color_space=unknown color_transfer=unknown color_primaries=unknown chroma_location=left field_order=unknown timecode=N/A refs=1 is_avc=true nal_length_size=4 id=N/A r_frame_rate=25/1 avg_frame_rate=25/1 time_base=1/12800 start_pts=0 start_time=0.000000 duration_ts=2119680 duration=165.600000 bit_rate=3002369 max_bit_rate=N/A bits_per_raw_sample=8 nb_frames=4140 nb_read_frames=N/A nb_read_packets=N/A DISPOSITION:default=1 DISPOSITION:dub=0 DISPOSITION:original=0 DISPOSITION:comment=0 DISPOSITION:lyrics=0 DISPOSITION:karaoke=0 DISPOSITION:forced=0 DISPOSITION:hearing_impaired=0 DISPOSITION:visual_impaired=0 DISPOSITION:clean_effects=0 DISPOSITION:attached_pic=0 DISPOSITION:timed_thumbnails=0 TAG:language=und TAG:handler_name=VideoHandler [/STREAM] [STREAM] index=1 codec_name=aac codec_long_name=AAC (Advanced Audio Coding) profile=LC codec_type=audio codec_time_base=1/48000 codec_tag_string=mp4a codec_tag=0x6134706d sample_fmt=fltp sample_rate=48000 channels=2 channel_layout=stereo bits_per_sample=0 id=N/A r_frame_rate=0/0 avg_frame_rate=0/0 time_base=1/48000 start_pts=0 start_time=0.000000 duration_ts=7945632 duration=165.534000 bit_rate=155418 max_bit_rate=317000 bits_per_raw_sample=N/A nb_frames=7761 nb_read_frames=N/A nb_read_packets=N/A DISPOSITION:default=1 DISPOSITION:dub=0 DISPOSITION:original=0 DISPOSITION:comment=0 DISPOSITION:lyrics=0 DISPOSITION:karaoke=0 DISPOSITION:forced=0 DISPOSITION:hearing_impaired=0 DISPOSITION:visual_impaired=0 DISPOSITION:clean_effects=0 DISPOSITION:attached_pic=0 DISPOSITION:timed_thumbnails=0 TAG:language=und TAG:handler_name=SoundHandler [/STREAM] [FORMAT] filename=Richard_matched.mp4 nb_streams=2 nb_programs=0 format_name=mov,mp4,m4a,3gp,3g2,mj2 format_long_name=QuickTime / MOV start_time=0.000000 duration=165.600000 size=65473748 bit_rate=3162982 probe_score=100 TAG:major_brand=isom TAG:minor_version=512 TAG:compatible_brands=isomiso2avc1mp41 TAG:encoder=Lavf58.29.100 TAG:comment=Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter. [/FORMAT]

yerfor commented 9 months ago

Hi, the PSNR curve seems normal, can you refer to the validation results generated during training, it should be something like this: checkpoints/May/lm3d_radnerf_torso/validation_results/validation_250000/images/frame_5573.png

ernestol0817 commented 9 months ago

Hi, that's great to hear regarding my PSNR curve. I looked into the validation results generated during training the directory has ten (10) PNG files.

(geneface) root@geneface-6477c848bb-cnlm4:/workspace# ls checkpoints/Richard/lm3d_radnerf_torso/validation_results/validation_250000/images/ frame_4112.png frame_4214.png frame_4316.png frame_4418.png frame_4520.png frame_4112_gt.png frame_4214_gt.png frame_4316_gt.png frame_4418_gt.png frame_4520_gt.png

I visually inspected each PNG and they all look absolutely perfect.

ernestol0817 commented 9 months ago

I went ahead and ran things again from scratch, going from data prep all the way through. The end result was the same. Looking forward to any help you all might be able to give me. Thanks in advance!

yerfor commented 9 months ago

Hi, as the validation_results could obtain reasonable results (a predicted torso), we can confirm that the training process is performed normally. So I suspect it should be something wrong with the inference code. Have you set the correct ckpt paths (head and torso ckpt) in the inference script?

ernestol0817 commented 9 months ago

Hi yerfor! Thanks again for taking time to help me! Its a massive relief to hear that you agree, training and validation results looking good! I'm not sure exactly where to check/change:

"So I suspect it should be something wrong with the inference code. Have you set the correct ckpt paths (head and torso ckpt) in the inference script?"

Could you give me some pointers on where to look for this in the codebase?

Would the changes you are suggesting here be isolated to the : scripts/infer_postnet.sh

like this: export CUDA_VISIBLE_DEVICES=0 export Video_ID=Richard export Wav_ID=Richard export Postnet_Ckpt_Steps=4000 # please reach to docs/train_models.md to get some tips about how to select an approprate ckpt_steps!

python inference/postnet/postnet_infer.py \ --config=checkpoints/${Video_ID}/lm3d_postnet_sync/config.yaml \ --hparams=infer_audio_source_name=data/raw/val_wavs/${Wav_ID}.wav,\ infer_out_npy_name=infer_out/${Video_ID}/pred_lm3d/${Wav_ID}.npy,\ infer_ckpt_steps=${Postnet_Ckpt_Steps} \ --reset

I thought I should also add that after running infer_lm3d_radnerf.sh I checked out the "tmp img dir"
In my case it was here:  workspace/infer_out/RichardTwo/pred_video/tmp_imgs/RichardTwo_radnerf_torso_smo

It has a ton of PNG files, and after visual inspection I see that all the PNG are missing torso...its just a floating head :)