yuangan / EAT_code

Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
Other
275 stars 31 forks source link

nothing happens when I run demo.py #10

Closed G-force78 closed 1 year ago

G-force78 commented 1 year ago

!python demo.py --root_wav /content/EAT_code/demo/video_processed/output --emo hap

deepprompt_eam3d_all_final_313 cuda is available /usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0% 0/1 [00:00<?, ?it/s] 0it [00:00, ?it/s] 100% 1/1 [00:00<00:00, 3715.06it/s]

Thats it, nothing is saved anywhere. However I am unsure what this refers to? Note 2: Replace the video_name/video_name.wav and deepspeech feature video_name/deepfeature32/video_name.npy, you can test with a new wav. The output length will depend on the shortest length of the audio and driven poses. Refer to here for more details.

yuangan commented 1 year ago

Can you check your '/content/EAT_code/demo/video_processed/output' folder? Is it organized the same with '/content/EAT_code/demo/video_processed/obama'? And Is there any image in './demo/imgs_cropped'? Here is an output example: image The generated video will be saved at "./demo/output/deepprompt_eam3d_all_final_313".

G-force78 commented 1 year ago

Hi, yes it is the same , the same happens when I move one of the prepared folders too such as /content/EAT_code/demo/W015_neu_1_002. It seems to be an all or nothing thing. I must be missing something in the directory, where does it draw the emo from? Maybe I'm missing that link? When all folders are included in demo it works however when I isolate one to test it doesnt. EAT

yuangan commented 1 year ago

Hi, you can print the allimg_cropped and all_wavs2 in this line to check if there are any wav or image files present. It appears that there is no image in the ./demo/imgs_cropped. Our demo.py read the image from the specified location and wav from the argument --root_wav.

G-force78 commented 1 year ago

Ive made it the same as Obama dirtectory and added a jpg to imgs_cropped (was png) now get this error

========= Extract latent keypoints from New image ====== 0% 0/15 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:521: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. arr = np.asanyarray(arr) 100% 15/15 [00:00<00:00, 15.26it/s] 0% 0/1 [00:00<?, ?it/s] 0% 0/16 [00:00<?, ?it/s]/content/EAT_code/demo.py:169: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. deep_feature = torch.from_numpy(np.array(deep_feature)).to(torch.float) 0% 0/16 [00:00<?, ?it/s] 0% 0/1 [00:00<?, ?it/s] Traceback (most recent call last): File "/content/EAT_code/demo.py", line 467, in test(f'./ckpt/{name}.pth.tar', args.emo, save_dir=f'./demo/output/{name}/') File "/content/EAT_code/demo.py", line 339, in test audio_frames, poseimgs, deep_feature, source_img, he_source, he_driving, num_frames, y_trg, z_trg, latent_path_driving = prepare_test_data(img_path, audio_path, config['model_params']['audio2kp_params'], emotype) File "/content/EAT_code/demo.py", line 169, in prepare_test_data deep_feature = torch.from_numpy(np.array(deepfeature)).to(torch.float) TypeError: can't convert np.ndarray of type numpy.object. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

yuangan commented 1 year ago

It seems the extracted DeepSpeech feature deepfeature is not right. You can print and check the deepfeature in here. It is expected to be a 'numpy.ndarray' with dimensions (N, 16, 29) and a data type of float32. If this is not the case, please review the preprocessing procedure as it could be influenced by the installed environment version.

G-force78 commented 8 months ago

Hi, I thought I would come back to this for another look. I have managed to convert to npy files and the directory seems to show all the correct parts however when I run preprocess this error occurs

==================done===================== Traceback (most recent call last): File "/content/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in for i, data in tqdm(enumerate(dataset)): File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1178, in iter for obj in iterable: File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 457, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py", line 37, in getitem A_path = self.A_paths[index] IndexError: list index out of range

========== extract latent from cropped videos ======= 100% 2/2 [00:00<00:00, 96.13it/s] /usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 0% 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:521: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. arr = np.asanyarray(arr) 100% 2/2 [00:12<00:00, 6.36s/it] =============done============== =========== extract poseimg from latent ============= /usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 100% 2/2 [00:00<00:00, 149.36it/s] ============== organize file for demo ===============

yuangan commented 8 months ago

Hi, I recommend to print the self.A_paths in line 37 of "/content/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py" and check whether the path exists. It seems the self.A_paths here are null.

G-force78 commented 8 months ago

I have it working now and other than a few paths needing adjusting for my environment the main reason the single video could not be tested was due to this block here, so as shown below I simply created directories to have the new processed data in.

all_wavs2 = [f'{root_wav}/{os.path.basename(root_wav)}.wav'] allimg = glob.glob('/content/EAT_code/demo/imgs1/.jpg') tmp_allimg_cropped = glob.glob('/content/EAT_code/demo/imgs_cropped1/.jpg') preprocess_imgs(allimg, tmp_allimg_cropped) # crop and align images

allimg_cropped = glob.glob('/content/EAT_code/demo/imgs_cropped1/*.jpg')
preprocess_cropped_imgs(allimg_cropped) # extract latent keypoints if necessary

Have you seen the new implementation EMO ? Seems to be based off the same code base but doesnt need a driving video it uses an image https://github.com/HumanAIGC/EMO

yuangan commented 8 months ago

Oh, I'm glad you have solved it. EMO has flooded my notifications. The head speed in EMO seems a condition for head pose generation. The pose generation has been researched in previous works[1]. I doubt whether they will open source as the project is done for business. It seems impossible to follow such work with my GPUs. If they open the pre-trained model, I will be more interested.

[1] WANG S, LI L, DING Y, et al. One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning[J/OL]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2531-2539. http://dx.doi.org/10.1609/aaai.v36i3.20154. DOI:10.1609/aaai.v36i3.20154.

G-force78 commented 8 months ago

One last question, why is it always 127 steps no matter the length of the wav/video?

yuangan commented 8 months ago

Your driving video may have 127 frames. image As you can see from my colab output, the ./demo/video_processed/W015_neu_1_002 has 88 frames. Then EAT will generate 88 times for every frame in driving video with every source image (maybe you have too many source images). You can try other driving wav/video. Theoretically, our model can handle driving wav/video of any length.