Closed G-force78 closed 1 year ago
Can you check your '/content/EAT_code/demo/video_processed/output' folder? Is it organized the same with '/content/EAT_code/demo/video_processed/obama'? And Is there any image in './demo/imgs_cropped'? Here is an output example: The generated video will be saved at "./demo/output/deepprompt_eam3d_all_final_313".
Hi, yes it is the same , the same happens when I move one of the prepared folders too such as /content/EAT_code/demo/W015_neu_1_002. It seems to be an all or nothing thing. I must be missing something in the directory, where does it draw the emo from? Maybe I'm missing that link? When all folders are included in demo it works however when I isolate one to test it doesnt.
Hi, you can print the allimg_cropped
and all_wavs2
in this line to check if there are any wav or image files present. It appears that there is no image in the ./demo/imgs_cropped
. Our demo.py read the image from the specified location and wav from the argument --root_wav.
Ive made it the same as Obama dirtectory and added a jpg to imgs_cropped (was png) now get this error
========= Extract latent keypoints from New image ======
0% 0/15 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:521: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
arr = np.asanyarray(arr)
100% 15/15 [00:00<00:00, 15.26it/s]
0% 0/1 [00:00<?, ?it/s]
0% 0/16 [00:00<?, ?it/s]/content/EAT_code/demo.py:169: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
deep_feature = torch.from_numpy(np.array(deep_feature)).to(torch.float)
0% 0/16 [00:00<?, ?it/s]
0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/EAT_code/demo.py", line 467, in
It seems the extracted DeepSpeech feature deepfeature
is not right. You can print and check the deepfeature
in here. It is expected to be a 'numpy.ndarray' with dimensions (N, 16, 29) and a data type of float32. If this is not the case, please review the preprocessing procedure as it could be influenced by the installed environment version.
Hi, I thought I would come back to this for another look. I have managed to convert to npy files and the directory seems to show all the correct parts however when I run preprocess this error occurs
==================done=====================
Traceback (most recent call last):
File "/content/EAT_code/preprocess/vid2vid/data_preprocess.py", line 26, in
========== extract latent from cropped videos ======= 100% 2/2 [00:00<00:00, 96.13it/s] /usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 0% 0/2 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/numpy/lib/npyio.py:521: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. arr = np.asanyarray(arr) 100% 2/2 [00:12<00:00, 6.36s/it] =============done============== =========== extract poseimg from latent ============= /usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] 100% 2/2 [00:00<00:00, 149.36it/s] ============== organize file for demo ===============
Hi, I recommend to print the self.A_paths in line 37 of "/content/EAT_code/preprocess/vid2vid/data/face_preprocess_eat.py" and check whether the path exists. It seems the self.A_paths here are null.
I have it working now and other than a few paths needing adjusting for my environment the main reason the single video could not be tested was due to this block here, so as shown below I simply created directories to have the new processed data in.
all_wavs2 = [f'{root_wav}/{os.path.basename(root_wav)}.wav'] allimg = glob.glob('/content/EAT_code/demo/imgs1/.jpg') tmp_allimg_cropped = glob.glob('/content/EAT_code/demo/imgs_cropped1/.jpg') preprocess_imgs(allimg, tmp_allimg_cropped) # crop and align images
allimg_cropped = glob.glob('/content/EAT_code/demo/imgs_cropped1/*.jpg')
preprocess_cropped_imgs(allimg_cropped) # extract latent keypoints if necessary
Have you seen the new implementation EMO ? Seems to be based off the same code base but doesnt need a driving video it uses an image https://github.com/HumanAIGC/EMO
Oh, I'm glad you have solved it. EMO has flooded my notifications. The head speed in EMO seems a condition for head pose generation. The pose generation has been researched in previous works[1]. I doubt whether they will open source as the project is done for business. It seems impossible to follow such work with my GPUs. If they open the pre-trained model, I will be more interested.
[1] WANG S, LI L, DING Y, et al. One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning[J/OL]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022: 2531-2539. http://dx.doi.org/10.1609/aaai.v36i3.20154. DOI:10.1609/aaai.v36i3.20154.
One last question, why is it always 127 steps no matter the length of the wav/video?
Your driving video may have 127 frames. As you can see from my colab output, the ./demo/video_processed/W015_neu_1_002 has 88 frames. Then EAT will generate 88 times for every frame in driving video with every source image (maybe you have too many source images). You can try other driving wav/video. Theoretically, our model can handle driving wav/video of any length.
!python demo.py --root_wav /content/EAT_code/demo/video_processed/output --emo hap
deepprompt_eam3d_all_final_313 cuda is available /usr/local/lib/python3.10/dist-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0% 0/1 [00:00<?, ?it/s] 0it [00:00, ?it/s] 100% 1/1 [00:00<00:00, 3715.06it/s]
Thats it, nothing is saved anywhere. However I am unsure what this refers to? Note 2: Replace the video_name/video_name.wav and deepspeech feature video_name/deepfeature32/video_name.npy, you can test with a new wav. The output length will depend on the shortest length of the audio and driven poses. Refer to here for more details.