yerfor / GeneFacePlusPlus

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
MIT License
1.31k stars 188 forks source link

IndexError: index 676 is out of bounds for dimension 0 with size 676 #22

Open lokvke opened 5 months ago

lokvke commented 5 months ago

| load 'model' from 'checkpoints/audio2motion_vae/model_ckpt_steps_400000.ckpt', strict=True | WARN: checkpoints/motion2video_nerf/may_torso/lm3d_radnerf_torso.yaml not exist. | load 'model' from 'checkpoints/motion2video_nerf/may_torso/model_ckpt_steps_250000.ckpt', strict=True trainval: Smooth head trajectory (rotation and translation) with a window size of 7 /data/zssy-digital-human/projects/gpp/tasks/radnerfs/dataset_utils.py:263: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.lm68s = torch.tensor(self.lm2ds[:, index_lm68_from_lm478, :]) Extracted wav file (16khz) from data/raw/val_wavs/8-27s.wav to data/raw/val_wavs/8-27s_16k.wav. Loading the HuBERT Model... /data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loading the Wav2Vec2 Processor... Traceback (most recent call last): File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 542, in GeneFace2Infer.example_run(inp) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 490, in example_run infer_instance.infer_once(inp) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 180, in infer_once out_name = self.forward_system(samples, inp) File "/data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 475, in forward_system self.forward_audio2secc(batch, inp) File "/data/home/yaokj5/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, *kwargs) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 384, in forward_audio2secc cano_lm3d = inject_blink_to_lm68(cano_lm3d) File "/data/zssy-digital-human/projects/gpp/inference/genefacepp_infer.py", line 103, in inject_blink_to_lm68 lm68[idx, 36:48] = lm68[idx, 36:48] (1-blink_factor) + closed_eye_lm68[idx, 36:48] * blink_factor IndexError: index 676 is out of bounds for dimension 0 with size 676

yerfor commented 5 months ago

It seems like a error caused by index out of bounds. Can you provide more details? Since the code should have convert the audio to 16k and video to 25 fps.

Ahmer-444 commented 5 months ago

@lokvke, could you please attempt it using an audio file longer than 10 seconds? In my testing, it consistently fails when the provided audio is less than 8 seconds.

AHarmlessPyro commented 4 months ago

Hey @yerfor. I tried running with longer audio clips as well. For the same audio clip, I tried the full length (around 1min 30s) and a 59s segment, both failed with a similar error, just the index value mention was different (but the same between multiple runs). It seems like it worked for a sample that was around 40s long. All samples were encoded to 16kHz successfully and as far as I can tell, the error seems to happen in the exact same line. Is there any other detail I can provide for this to help debug this issue ?

benchrus commented 4 months ago

Hi, I have the same problem. Always IndexError appears and same error with different lengths of drive audio. How can I solve this problem?

lokvke commented 4 months ago

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution:

idx = i % (i + j)

(ps: the blinking result seems not very natural)

yerfor commented 4 months ago

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution:

idx = i % (i + j)

(ps: the blinking result seems not very natural)

MiaoJiawei97 commented 2 weeks ago

in the inject_blink_to_lm68 function, when the generated video contatins 676 frames, T=676. So when i=675, j=1, the idx=676(out of index), here is my solution: idx = i % (i + j) (ps: the blinking result seems not very natural)

  • Hi, thanks for your comment. I will update the mentioned modification in the latest commit.
  • As for the blinking results, the blink motion is controlled by the hard-coded blink_factor_lst = np.array([0.1, 0.5, 0.7, 1.0, 0.7, 0.5, 0.1]) # * 0.9 in the inject_blink_to_lm68 function. Maybe you can try different values to improve the naturalness of eye blink.

is this issue fixed now??