Open geek0075 opened 2 years ago
Hi @geek0075, we have a lot of actors that are willing to be used as avatars. If you managed to make FACIAL work successfully, let me know. We can work together and test multiple scripts until we get optimal results with FACIAL.
Hi. Yes. @flipkast . I made FACIAL work. What do you have in mind? Thanks...
Hi @geek0075 you can pm me at niklentis@gmail.com to discuss.
Hi @flipkast. Done....
Hi @flipkast. Done....
Hello, can you please share your results? Generated video samples My results are kinda bad. Eyes behaviour is very strange
Carlson doing Obama Speech https://drive.google.com/file/d/1wh785MZ0W_3Ny5nOgTeVOf0uGH2gjQNV/view?usp=sharing
Carlson doing some speech https://drive.google.com/file/d/1Q0jf1a2MsWfInMHfL0luAVaRuAakHm4m/view?usp=sharing
I trained a Carlson model from YouTube video of Mr Carlson. I made him do the Obama test speech available on the original FACIAL repository. Also made him do some other speech. Both results are given above...Let me hear your views. Cheers.
I trained a Carlson model from YouTube video of Mr Carlson. I made him do the Obama test speech available on the original FACIAL repository. Also made him do some other speech. Both results are given above...Let me hear your views. Cheers.
Thanks for sharing your results! Video is pixelised, i think you should train face2video model for more iterations And he doesn't blink at all, why? You removed openface features?
Yes - video is pixelised because I increased the size during preprocessing only. Otherwise it is good. The output talking head's quality depends a lot on the quality and size of the input training video. I preprocessed my input video by cropping to a section and sizing to 512x512. That's why it's pixelised...
Ideally you would capture the input video as square with a high pixel resolution focusing only on the face only. The input video for that Carlson example was no such thing ;-). I had to crop off a square section of it and then blow it up to 512x512 pixel resolution...
No - I did not remove open face results. The talking head is like the real carlson training video - no blinks there either. I will search my hard-disk for the original training video and share...
Here's a biden talking head:
https://drive.google.com/file/d/1fg90JjsEyJM5yhr4zIKVEw_ShCehea1R/view?usp=sharing
You can see this is even lower quality than the Carlson one. This is because the input is also low quality...
Here's the original Carlson training video from the YouTube extract:
https://drive.google.com/file/d/1a5wvr9ThOG_SGPRrs8yjoSKTJU9RfSFj/view?usp=sharing
I had to crop it only to the carlson head, say with resolution 80x80 (capturing neck up and head only), then blow that up (to 512x512);
https://drive.google.com/file/d/1FnvnvPk7mFC0jSGwPd1etyoIkr8DQaCJ/view?usp=sharing
That is the first frame of the training video. You can see that it is pixelated. Just like the output is ;-).
@NikitaKononov It's about the quality of the input training video! I used videos extracted from YouTube which I subsequently applied preprocessing to in order to bring them to a size and format that I use to train FACIAL. YouTube videos are not professionally shot to create talking head videos ;-).
In order to build a commercial or even non-commercial product out of this, one should create the input videos specifically for making talking head videos kinda like the original training video that comes with the FACIAL repo - you know that lady talking...
Good luck in your efforts.
I am not really worried about pixelation. I understand how and why that came by. All I have to do is fine-tune with a non-pixelated training video and I will get a non-pixelated talking head. I am 100% sure of this...
I am not really worried about pixelation. I understand how and why that came by. All I have to do is fine-tune with a non-pixelated training video and I will get a non-pixelated talking head. I am 100% sure of this...
Oh yeah, I see Thanks for you advice, really helpful and interesting! Yes, videos should be kinda specific for that task
The idea of FACIAL is great, but I think some things might be improved and boost quality a lot
I am thinking about replacing pix2pix model (face2video) with some modern generative model Or scale up input/output resolution for current pix2pix model
Also I think, that deepspeech isn't so good at features Maybe audio feature mechanism from wav2lip will be better. We will be able to train ultimate audio2face model on large dataset And then train only face2video for any person
Have big plans on research and development for this repo
I am not really worried about pixelation. I understand how and why that came by. All I have to do is fine-tune with a non-pixelated training video and I will get a non-pixelated talking head. I am 100% sure of this...
Have you made any improvements for this repo? Or you use code "as it is" Maybe you have some advice for choosing hyperparameters of models in this work or something else. It will be great, if you will be so kind to share some tips. Thanks!
Hey @NikitaKononov sorry for late response. I was looking at Pix2Pix and researching to see if a better image-to-image model exists. I did not find anything yet. I notice tha FACIAL uses Nvidia Pix2PixHD
https://catalog.ngc.nvidia.com/orgs/nvidia/models/pix2pixhd https://github.com/NVIDIA/pix2pixHD
StyleGAN is a state of the art (also created by Nvidia), and I found some image-to-image model based off StyleGAN:
https://arxiv.org/abs/2008.00951
My research into FACIAL is just 2 months old ;-). Modifying and improving the model should take much much longer than 2 months. In 2 months I was able to use the code as is properly. Improvements require experimentation which takes time....
The version of DeepSpeech used in FACIAL is old. You might simply start by retraining DeepSpeech of a more recent version and using with current FACIAL pipeline. I had considered doing that but stopped due to lack of infrastructure...
I like your enthusiasm on the FACIAL project. Please feel free to keep me abreast of your research...Cheers.
Hi!@geek0075,I saw the results of your reappearance on FACIAL, and you did a very good job. I am also very interested in FACIAL. But I have encountered some problems in the process of running FACIAL. For example, There are no /obama2/test_1.avi, obama2/test_1_audio.avi and obama2/test_1_audio.avi in the path ../examples/test_image/. and this problem corresponds to the code is: video_new = '../examples/test_image/obama2/test_1.avi' output = '../examples/test_image/obama2/test_1_audio.avi' output_mp4 = '../examples/test_image/obama2/test_1_audio.mp4' So I would like to ask if you have encountered similar problems. I am looking forward to your reply, Thanks!
@geek0075 good job! How did you make the training code work? which version of tensorflow did you use? I have encountered some problems to to train this model, would you mind to help me with the issue #81 ? Thank you very much!
Hi. I have been busy. Will review and revert to your comments. Thanks.
@geek0075 Hi, I am looking forward to your reply. Thanks
@Mxwgreat Again accept my apologies for my late response. The attached is a sort of guide I wrote a while ago when I first started with FACIAL ;-). I outgrew it, but I am sure it will still be useful to you. Please have a READ through every section and let me know if it addresses your issues...
Cheers.
@geek0075 Thanks for your sharing! I will read it carefully.
@Mxwgreat Your welcome. Like I said I created it at the start of my journey with FACIAL. If you read my issues raised you can see issues I encountered and how they were solved. Also I got so many great insights into creating talking heads with FACIAL, but then I dropped it and am now focused on other development efforts...
https://github.com/zhangchenxu528/FACIAL/issues/61 https://github.com/zhangchenxu528/FACIAL/issues/66 https://github.com/NVlabs/nvdiffrast/issues/77
Also so many small issues will crop up along the way. You can check with me and I will let you know how I solved them optimally eventually...
Cheers.
Hi! @geek0075,I saw all your issues. i used the deep3d_pytorch to generate lots of .mat , how do i use it to replace that example video_preproccess. Your guide did not saw how to change it.
Hi @amzzz2020. Let me get back to you ASAP. Cheers.
@amzzz2020 Please have a look here: https://colab.research.google.com/drive/1Z1tFPFf-O_HpaxshTqKM24TC_rrjR7Xc?usp=sharing
@amzzz2020 I discussed this issue here:
https://github.com/zhangchenxu528/FACIAL/issues/61
If you look at file:
https://github.com/zhangchenxu528/FACIAL/blob/main/face_render/handle_netface.py
You will see that the files (*.mat) will be loaded from '/content/FACIAL/video_preprocess/train1_deep3Dface'. So you either place them there or place them elsewhere and use the parameter: '--param_folder'.
Let me know how this works out for you...
Cheers.
@geek0075 Thank you for your reply. According to the previous issue, I have solved the problem, but the generated result is very poor
Hi. @amzzz2020 Please share your result here and Thanks.
Hi, @geek0075 eyemask.npy in the face_render folder, loaded directly by render_netface_fitpose.py and rendering_gaosi.py, for example, mask3 = np.load('eyemask.npy'). I would be interested to know how eyemask.npy is generated, and in this project, eyemask.npy is provided directly in the face_render folder. I wonder if you've thought about it? I am looking forward to your reply. Thanks!
@Mxwgreat sorry for my late response. I have not explored to the point of the eyemask. Been up to many different things since exploring this repository. Wish I had the resources to keep on exploring...
Thanks for your reply. I will continue to explore! ----- 原始邮件 ----- 发件人:geek0075 @.> 收件人:zhangchenxu528/FACIAL @.> 抄送人:Mxwgreat @.>, Mention @.> 主题:Re: [zhangchenxu528/FACIAL] Talking Head Avatar: Optimum Script for Training (Issue #66) 日期:2023年03月04日 22点51分
@Mxwgreat sorry for my late response. I have not explored to the point of the eyemask. Been up to many different things since exploring this repository. Wish I had the resources to keep on exploring...
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hi All, Hi Professor @zhangchenxu528,
I have been working with this repository for over 1 month now. I have raised an issue (https://github.com/zhangchenxu528/FACIAL/issues/61), which has been successfully resolved. So I am now able to successfully create Talking Heads using this repository.
However, there are several other issues to consider, even after being able to successfully create a talking head. Some of these issues include but are not limited to:
So in accordance with the above, their exists the following scripts (text content to be read)
My first question related to deploying a useful FACIAL network is this:
What is an optimal Training script to make the training avatar read, such that once done, the trained avatar can now successfully read any and every possible Test script? I have seen that several shops make the avatar actors read and evoke to text pangrams (https://en.wikipedia.org/wiki/Pangram). Is this the way to go?
I am hoping to get some direction on this issue from the Professor, @zhangchenxu528 ....
Kind Regards & Thanks.
Kay.