zhoudaquan / ChatAnything

Official Repo for the Paper: CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS
375 stars 27 forks source link

real-time chat #2

Open jingli-wtbox opened 8 months ago

jingli-wtbox commented 8 months ago

Thank you for sharing such great work. It's awesome.

I find it's like a real-time chat when i go through some examples, like "Examples on Image-based Chat Persona" in below page:

example

May I know if ChatAnything supports real-time chat?

thanks

ermu2001 commented 8 months ago

Setting up the conversation usually takes around 60 sec.

Afterwards Chatting would usually takes 6 sec to get respond from chatgpt.

I tested on one gpu rendering takes around 8 sec (RTX A5000). But the rendering of sadtalker could be parallelized.

You can try run locally and see whether it was real-time. :)

jingli-wtbox commented 8 months ago

thank you. will have a try on other types of GPU.

puffy310 commented 8 months ago

Can you theoretically just run this one 8xH100 and it'll work in "real-time". Maybe a real time conversation version of this software should be looked into.

zhoudaquan commented 8 months ago

Can you theoretically just run this one 8xH100 and it'll work in "real-time". Maybe a real time conversation version of this software should be looked into.

Hi, thanks for your interest in the work! we do not have H100 at hand right now... However, based on our observation, on A100 GPUs, the total time cost excluding GPT API calls is within 10s, and the face rendering process takes 1-2s. We will try to replace the ChatGPT APIs for real-time chat in the coming month..

tolecy commented 8 months ago

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows:

=================================== Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0 OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]

I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.

(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

tolecy commented 8 months ago

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows:

===================================

Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s] I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.

(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

btw,this is the message for the face rendering process --- “Thank you for the kind words. It is a pleasure to meet you as well. I am here to share the magic and beauty of the world around us. If you have any questions or need any guidance, I am always here to help.”

puffy310 commented 8 months ago

How did you replace ChatGPT, with another OpenAI model or a locally hosted OpenAI API compatible program?

tolecy commented 8 months ago

How did you replace ChatGPT, with another OpenAI model or a locally hosted OpenAI API compatible program?

I simply wrapped my local model as a service (with input-output format similar to OpenAI) and deployed it locally, and then made some modifications to the content of /chat_anything/chatbot/chat.py.

ermu2001 commented 7 months ago

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows:

===================================

Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s] I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky.

(My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

The facial image generation only executes once -- at the first round of conversation "..., Bot: how are you doing...". I think it would be acceptable for the latency since.

And by the way, this step "seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]" is a option for sadtalker to somehow not crop out the face for rendering and pasting it back. You can disable it by unchecking the "Use full body instead of a face." on the setting tab. It seems unoptimized and takes up lots of time O.o

https://github.com/zhoudaquan/ChatAnything/blob/main/chat_anything/sad_talker/utils/paste_pic.py#L59-L65

puffy310 commented 7 months ago

Very excited to see more progress in this area!

tolecy commented 7 months ago

Great project! I replaced chatGPT with my own small model and tested it on my own 3080ti graphics card, and the time consumption details are as follows:

===================================

Face Renderer:: 100%|80/80 [00:22<00:00, 3.49it/s]

fps:25.0

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s] I wonder if anyone has any efficient implementation or ideas for accelerating the video generation process. I have been interested in this recently. What I want to do now is to output the facial image generation process in synchronization with the voice after TTS is completed. However, because the facial generation process is relatively slow, the streaming effect will actually be very jerky. (My goal now is to be as smooth as D-ID, input any image and voice, and quickly generate videos or smooth streaming output.)

The facial image generation only executes once -- at the first round of conversation "..., Bot: how are you doing...". I think it would be acceptable for the latency since.

And by the way, this step "seamlessClone:: 100%|318/318 [00:15<00:00, 20.66it/s]" is a option for sadtalker to somehow not crop out the face for rendering and pasting it back. You can disable it by unchecking the "Use full body instead of a face." on the setting tab. It seems unoptimized and takes up lots of time O.o

https://github.com/zhoudaquan/ChatAnything/blob/main/chat_anything/sad_talker/utils/paste_pic.py#L59-L65

Yep, When running on 4090 (considering only face render), the time required for video generation is not significantly different from the video length. Theoretically, if it is a streaming output (at 25fps), a relatively smooth feeling can be achieved.

Currently, I am trying to integrate live2D, and further, I hope to input a custom full-body image for full-body driving (this is my next plan) just like a prepared-live2D model, but I don’t have much experience in this field of cv. Any suggestions about this?