yerfor / GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
MIT License
2.44k stars 290 forks source link

Train GeneFace on other target person videos #205

Open ernestol0817 opened 9 months ago

ernestol0817 commented 9 months ago

I'm trying to follow,Train GeneFace on other target person videos, and facing some isues. I have successfully completed setup using "May and Zozo", but during the data prep of my own video (which is 3min 2 sec long) I recieve several errors when running: conda activate geneface export PYTHONPATH=./ export VIDEO_ID=CustomVideo CUDA_VISIBLE_DEVICES=0 data_gen/nerf/process_data.sh $VIDEO_ID

Please note that I have completed: data/raw/videos/CustomVideo.mp4 and egs/datasets/videos/CustomVideo/* (contains updated files copied from May and then edited by replacing May with "CustomVideo")

I believe this might be a Data issue and am wondering if my training video might be of wrong type. The video I'm using is MP4 h.246 at HD 720p / 30 fps 91280 x720)

Do I need to change my video or am I missing something else? Below is my log output:

(geneface) root@geneface-6477c848bb-cnlm4:/workspace# bash data_gen/nerf/process_data.sh $VIDEO_ID [INFO] ===== extract audio from data/raw/videos/CustomVideo.mp4 to data/processed/videos/CustomVideo/aud.wav ===== ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0) configuration: --prefix=/tmp/build/80754af9/ffmpeg_1587154242452/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264 libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/raw/videos/CustomVideo.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Duration: 00:01:20.30, start: 0.000000, bitrate: 15387 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 15208 kb/s, 30.01 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 178 kb/s (default) Metadata: handler_name : SoundHandler File 'data/processed/videos/CustomVideo/aud.wav' already exists. Overwrite ? [y/N] y Stream mapping: Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data/processed/videos/CustomVideo/aud.wav': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 ISFT : Lavf58.29.100 Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, stereo, s16, 512 kb/s (default) Metadata: handler_name : SoundHandler encoder : Lavc58.54.100 pcm_s16le size= 5019kB time=00:01:20.31 bitrate= 511.9kbits/s speed= 586x
video:0kB audio:5019kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001518% [INFO] ===== extracted audio ===== [INFO] ===== extract audio labels for data/processed/videos/CustomVideo/aud.wav ===== [INFO] ===== start extract esperanto ===== [INFO] ===== extract images from data/raw/videos/CustomVideo.mp4 to data/processed/videos/CustomVideo/ori_imgs ===== ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0) configuration: --prefix=/tmp/build/80754af9/ffmpeg_1587154242452/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264 libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/raw/videos/CustomVideo.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Duration: 00:01:20.30, start: 0.000000, bitrate: 15387 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 15208 kb/s, 30.01 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 178 kb/s (default) Metadata: handler_name : SoundHandler Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help [swscaler @ 0x55da7e515000] deprecated pixel format used, make sure you did set range correctly Output #0, image2, to 'data/processed/videos/CustomVideo/ori_imgs/%d.jpg': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.29.100 Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 1280x720 [SAR 1:1 DAR 16:9], q=1-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default) Metadata: handler_name : VideoHandler encoder : Lavc58.54.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 ALSA lib confmisc.c:767:(parse_card) cannot find card '0'te=N/A speed=12.8x
ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5220:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM sysdefault ALSA lib confmisc.c:767:(parse_card) cannot find card '0' ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5220:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM sysdefault ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib confmisc.c:767:(parse_card) cannot find card '0' ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5220:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM default ALSA lib confmisc.c:767:(parse_card) cannot find card '0' ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5220:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM default ALSA lib confmisc.c:767:(parse_card) cannot find card '0' ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name ALSA lib conf.c:4732:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5220:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM dmix [WARN] audio has 2 channels, only use the first. [INFO] loaded audio stream data/processed/videos/CustomVideo/aud.wav: (1284779,) [INFO] loading ASR model cpierse/wav2vec2-large-xlsr-53-esperanto... /opt/conda/envs/geneface/lib/python3.9/site-packages/transformers/configuration_utils.py:380: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments. warnings.warn( Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. frame= 2008 fps=335 q=1.0 Lsize=N/A time=00:01:20.32 bitrate=N/A speed=13.4x
video:407076kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown [INFO] ===== extracted images ===== [INFO] ===== extract face landmarks from data/processed/videos/CustomVideo/ori_imgs ===== [START] stande ka ĉabano laj restin saves kia oli hia raŭ mi apongis mi a ka po estus signifeken ta uzas sian meksaj loĉada moni onas truj flu oj en anfuam bla evue lasta mon dbegu al sedvo oni donis tuj de si aŭ mi bonan paŭĝin ankoj rkosta ne ĉiu daŭ daŭ meno si kjori apo aŝis sa omer so enir ersnaho foj apo kaj ŝo tinka vere sa ke ŝo sikoriuaĉ jon odi oŭame fork respal enu ŝovas e niru ac la komo ŭe hibelo anaŭ aksepan s i kia nek sap i birda por in o sikavi pafon ne sonĝ as neniakoj verpona lu l abnomo sekj ori tikibjo embakseven hro jn anono sinj o ri estas ea e beest inaŭ so kion li pafon akpaj hinan de estina okora ŝin cara savegiu anlaŭbo trueas en aktos nasus se kudas min s vita kapor kanse ab inaŭ sekjari en troes tas gaja bak oni forsave nu ino sekreti pa fomen kluzon hia s feŝojn en baŭa fiĉis izaŭ asad o ŝiovas por tagin ifo madhasta fine aŭ okspeŝi asestu mosvare enkon tine oforgin ka vas eli hor tos er sonte dazmen ĉis sobtfero ta i u sas [END] [INFO] save all feats for training purpose... [INFO] saved logits to data/processed/videos/CustomVideo/aud_esperanto.npy [INFO] ===== extracted esperanto ===== [INFO] ===== extract deepspeech ===== 0%| | 0/4598 [00:00<?, ?it/s]/root/.tensorflow/models/deepspeech-0_1_0-b90017e8.pb 2023-09-27 02:11:24.038579: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-09-27 02:11:24.117748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 43283 MB memory: -> device: 0, name: NVIDIA A40, pci bus id: 0000:17:00.0, compute capability: 8.6 tring to extract deepspeech from audio file: data/processed/videos/CustomVideo/aud.wav The target is: data/processed/videos/CustomVideo/aud_deepspeech.npy /workspace/data_util/deepspeech_features/deepspeech_features.py:50: UserWarning: Audio has multiple channels, the first channel is used warnings.warn( 2023-09-27 02:11:27.323432: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. The deepspeech extracted successfully, saved at: data/processed/videos/CustomVideo/aud_deepspeech.npy The shape is: (2007, 16, 29) [INFO] ===== extracted deepspeech ===== [INFO] ===== extracted all audio labels ===== 44%|██████████████████████████████████████▍ | 2006/4598 [02:32<02:55, 14.75it/s]/opt/conda/envs/geneface/lib/python3.9/site-packages/face_alignment/api.py:147: UserWarning: No faces were detected. warnings.warn("No faces were detected.") [INFO] ===== MAYBE cant find face on data/processed/videos/CustomVideo/ori_imgs/2007.jpg ===== 100%|███████████████████████████████████████████████████████████████████████████████████████▉| 4594/4598 [05:44<00:00, 10.35it/s][INFO] ===== MAYBE cant find face on data/processed/videos/CustomVideo/ori_imgs/4596.jpg ===== 100%|███████████████████████████████████████████████████████████████████████████████████████▉| 4596/4598 [05:44<00:00, 10.78it/s][INFO] ===== MAYBE cant find face on data/processed/videos/CustomVideo/ori_imgs/4597.jpg ===== [INFO] ===== MAYBE cant find face on data/processed/videos/CustomVideo/ori_imgs/4451.jpg ===== 100%|████████████████████████████████████████████████████████████████████████████████████████| 4598/4598 [05:44<00:00, 13.33it/s][INFO] ===== extracted face landmarks ===== [INFO] ===== extract semantics from data/processed/videos/CustomVideo/ori_imgs to data/processed/videos/CustomVideo/parsing ===== [INFO] ===== perform face tracking ===== processed parsing 100 processed parsing 200 processed parsing 300 processed parsing 400 600 loss_lan= 3.0252957344055176 mean_xy_trans= -3.3235762119293213 processed parsing 500 processed parsing 600 processed parsing 700 processed parsing 800 700 loss_lan= 2.848970413208008 mean_xy_trans= -3.910430431365967 processed parsing 900 processed parsing 1000 processed parsing 1100 processed parsing 1200 800 loss_lan= 2.7322795391082764 mean_xy_trans= -4.50252628326416 processed parsing 1300 processed parsing 1400 processed parsing 1500 processed parsing 1600 900 loss_lan= 2.631852626800537 mean_xy_trans= -5.049851894378662 processed parsing 1700 processed parsing 1800 processed parsing 1900 processed parsing 2000 1000 loss_lan= 2.5619239807128906 mean_xy_trans= -5.6157755851745605 processed parsing 2100 processed parsing 2200 processed parsing 2300 processed parsing 2400 processed parsing 2500 1100 loss_lan= 2.5277292728424072 mean_xy_trans= -6.205060958862305 processed parsing 2600 processed parsing 2700 processed parsing 2800 processed parsing 2900 1200 loss_lan= 2.507002830505371 mean_xy_trans= -6.805706024169922 processed parsing 3000 processed parsing 3100 processed parsing 3200 processed parsing 3300 1300 loss_lan= 2.4901859760284424 mean_xy_trans= -7.41807222366333 processed parsing 3400 processed parsing 3500 processed parsing 3600 processed parsing 3700 1400 loss_lan= 2.480703592300415 mean_xy_trans= -8.033599853515625 processed parsing 3800 processed parsing 3900 processed parsing 4000 processed parsing 4100 1500 loss_lan= 2.4797327518463135 mean_xy_trans= -8.688793182373047 processed parsing 4200 processed parsing 4300 processed parsing 4400 processed parsing 4500 1600 loss_lan= 2.477520704269409 mean_xy_trans= -9.362630844116211 find best focal 1600 [INFO] ===== extracted semantics ===== trained on focal= 1600 best_loss_lan= 3.6974525451660156 mean_xy_trans= -9.82349967956543 Traceback (most recent call last): File "/workspace/data_util/face_tracking/face_tracker.py", line 230, in imgs = np.stack(imgs) File "<__array_function__ internals>", line 5, in stack File "/opt/conda/envs/geneface/lib/python3.9/site-packages/numpy/core/shape_base.py", line 427, in stack raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape [INFO] ===== finished face tracking ===== [INFO] ===== extract background image from data/processed/videos/CustomVideo/ori_imgs ===== 100%|██████████████████████████████████████████████████████████████████████████████████████████| 230/230 [13:25<00:00, 3.50s/it]Traceback (most recent call last): File "/workspace/data_util/process.py", line 431, in extract_background(processed_dir, ori_imgs_dir) File "/workspace/data_util/process.py", line 107, in extract_background imgs = np.stack(imgs).reshape(-1, num_pixs, 3) File "<__array_function__ internals>", line 5, in stack File "/opt/conda/envs/geneface/lib/python3.9/site-packages/numpy/core/shapebase.py", line 427, in stack raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape [ WARN:0@0.021] global loadsave.cpp:248 findDecoder imread('data/processed/videos/CustomVideo/bc.jpg'): can't open/read file: check file path/integrity [INFO] ===== extract head images for data/processed/videos/CustomVideo ===== 0%| | 0/4598 [00:00<?, ?it/s]Traceback (most recent call last): File "/workspace/data_util/process.py", line 435, in extract_head(processed_dir) File "/workspace/data_util/process.py", line 142, in extract_head img[~head_part] = bg_img[~head_part] TypeError: 'NoneType' object is not subscriptable [INFO] ===== save transforms ===== Traceback (most recent call last): File "/workspace/data_util/process.py", line 448, in save_transforms(processed_dir, ori_imgs_dir) File "/workspace/data_util/process.py", line 296, in save_transforms params_dict = torch.load(os.path.join(base_dir, 'track_params.pt')) File "/opt/conda/envs/geneface/lib/python3.9/site-packages/torch/serialization.py", line 791, in load with _open_file_like(f, 'rb') as opened_file: File "/opt/conda/envs/geneface/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like return _open_file(name_or_buffer, mode) File "/opt/conda/envs/geneface/lib/python3.9/site-packages/torch/serialization.py", line 252, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'data/processed/videos/CustomVideo/track_params.pt' Loading the Wav2Vec2 Processor... Loading the HuBERT Model... Hubert extracted at data/processed/videos/CustomVideo/aud_hubert.npy Mel and F0 extracted at data/processed/videos/CustomVideo/aud_mel_f0.npy loading the model from deep_3drecon/checkpoints/facerecon/epoch_20.pth loading video ... extracting 2D facial landmarks ...: 100%|███████████████████████████████████████████████████▉| 2408/2409 [02:43<00:00, 15.00it/s]WARNING: Caught errors when fa.get_landmarks, maybe No face detected at frame 2409 in data/raw/videos/CustomVideo.mp4! extracting 2D facial landmarks ...: 100%|███████████████████████████████████████████████████▉| 2408/2409 [02:43<00:00, 14.71it/s]Traceback (most recent call last): File "/workspace/data_gen/nerf/extract_3dmm.py", line 56, in process_video lm68 = fa.get_landmarks_from_image(frames[i])[0] # 识别图片中的人脸,获得角点, shape=[68,2] TypeError: 'NoneType' object is not subscriptable

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/workspace/data_gen/nerf/extract_3dmm.py", line 112, in process_video(video_fname, out_fname, skip_tmp=False) File "/workspace/data_gen/nerf/extract_3dmm.py", line 59, in process_video raise ValueError("") ValueError | Unknow hparams: [] | Hparams chains: ['egs/egs_bases/radnerf/base.yaml', 'egs/egs_bases/radnerf/lm3d_radnerf.yaml', 'egs/datasets/videos/CustomVideo/lm3d_radnerf.yaml'] | Hparams: accumulate_grad_batches: 1, ambient_out_dim: 2, amp: True, base_config: ['egs/egs_bases/radnerf/lm3d_radnerf.yaml'], binary_data_dir: data/binary/videos, bound: 1, camera_offset: [0, 0, 0], camera_scale: 4.0, clip_grad_norm: 0, clip_grad_value: 0, cond_out_dim: 64, cond_type: idexp_lm3d_normalized, cond_win_size: 1, cuda_ray: True, debug: False, density_thresh: 10, density_thresh_torso: 0.01, desired_resolution: 2048, dt_gamma: 0.00390625, eval_max_batches: 100, exp_name: , far: 0.9, finetune_lips: True, finetune_lips_start_iter: 200000, geo_feat_dim: 128, grid_interpolation_type: linear, grid_size: 128, grid_type: tiledgrid, gui_fovy: 21.24, gui_h: 512, gui_max_spp: 1, gui_radius: 3.35, gui_w: 512, hidden_dim_ambient: 128, hidden_dim_color: 128, hidden_dim_sigma: 128, individual_embedding_dim: 4, individual_embedding_num: 13000, infer: False, infer_audio_source_name: ,
infer_bg_img_fname: , infer_c2w_name: , infer_cond_name: , infer_lm3d_clamp_std: 2.5, infer_lm3d_lle_percent: 0.0, infer_lm3d_smooth_sigma: 0.0, infer_out_video_name: , infer_scale_factor: 1.0, infer_smo_std: 0.0, infer_smooth_camera_path: True9, print_nan_grads: False, processed_data_dir: data/processed/videos,
raw_data_dir: data/raw/videos, resume_from_checkpoint: 0, save_best: True, save_codes: ['tasks', 'modules', 'egs'], save_gt: True, scheduler: exponential, seed: 9999, smo_win_size: 5, smooth_lips: False, task_cls: tasks.radnerfs.radnerf.RADNeRFTask, tb_log_interval: 100, torso_head_aware: False, torso_individual_embedding_dim: 8, torso_shrink: 0.8, update_extra_interval: 16,
upsample_steps: 0, use_window_cond: True, val_check_interval: 2000, valid_infer_interval: 10000, valid_monitor_key: val_loss,
valid_monitor_mode: min, validate: False, video_id: CustomVideo, warmup_updates: 0, weight_decay: 0, with_att: True, work_dir: , loading deepspeech ... loading Esperanto ... loading hubert ... loading Mel and F0 ... loading 3dmm coeff ... Traceback (most recent call last): File "/workspace/data_gen/nerf/binarizer.py", line 277, in binarizer.parse(hparams['video_id']) File "/workspace/data_gen/nerf/binarizer.py", line 267, in parse ret = load_processed_data(processed_dir) File "/workspace/data_gen/nerf/binarizer.py", line 98, in load_processed_data coeff_dict = np.load(coeff_npy_name, allow_pickle=True).tolist() File "/opt/conda/envs/geneface/lib/python3.9/site-packages/numpy/lib/npyio.py", line 416, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: 'data/processed/videos/CustomVideo/vid_coeff.npy'

ernestol0817 commented 9 months ago

Update:
I made some changes to the code locally and was able to get this working: 1) global update from get_landmarks(Depreicated) to get_landmarks_from_image 2) process.py extract_landmarks i added some code to break out if any frames do not have a face in them or have any black frames. ... if input is None: print(f'[INFO] ===== Image not read properly from {image_path} =====') break ... print(f"Image shape: {input.shape} ::::: in file: {image_path}") ... if preds is None: print(f"[INFO] ===== MAYBE can't find face on {image_path} ===== placing in debug here: debug_folder/{os.path.basename(image_path)}")

Debug: Save the problematic frame for inspection

        cv2.imwrite(f'/workspace/debug_folder/{os.path.basename(image_path)}', input)
        break
    elif len(preds) > 0:
        lands = preds[0].reshape(-1, 2)[:,:2]
        np.savetxt(image_path.replace('jpg', 'lms'), lands, '%f')
    else:
        print(f"[INFO] ===== MAYBE can't find face on {image_path} =====")

3) I updated render_3dmm.py get_render() to to set bin_size and max_faces_per_bin , this noticeably slowed down processing and bin_size set to zero was my last fall back after attempting a wide variety of combinations. ... bin_size=0, max_faces_per_bin=None,

The above solved my problem, so I thought i would share with the rest of you. I'm sure my approach isn't the best, so if anyone has better suggestions I'm happy to test out any ideas that you all might have which could improve on my rather brute force approach.