Realtime demo using OSSDC VisionAI platform

mchong6 / GANsNRoses

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

MIT License

1.16k stars 151 forks source link

Realtime demo using OSSDC VisionAI platform #4

Open mslavescu opened 3 years ago

mslavescu commented 3 years ago

@mchong6 thanks for open sourcing this project! It is really fun!

See here a realtime demo using your implementation:

Have fun with GANsNRoses - using OSSDC VisionAI realtime video processing platform https://www.youtube.com/watch?v=YZTzjk_qh4w

More details in video description. It takes less than 5 min to run it in Google Colab with realtime video streamed from any Android 4.2.2 phone/tablet/media player camera.

AK391 commented 3 years ago

the demo colors seem to be off, might need to convert bgr to rgb and I also tried to run video, its take to long on cpu were you able to get this working fast on cpu for video?

mchong6 commented 3 years ago

Thanks for sharing, its very cool! Running videos on cpu is definitely very slow. A potential way to speed up might be to batch the operation. In the notebook right now, I am running one image at a time.

mslavescu commented 3 years ago

the demo colors seem to be off, might need to convert bgr to rgb and I also tried to run video, its take to long on cpu were you able to get this working fast on cpu for video?

@AK391 this demo I ran on Nvidia 3080 Mobile, getting about 50 FPS on 240x320 image.

You can reproduce my demo very easy using this Google Colab notebook, just run the setup steps up to MediaPipe section then jump to GANsNRoses section and run those steps: https://colab.research.google.com/github/OSSDC/OSSDC-VisionAI-Core/blob/master/OSSDC_VisionAI_demo_reel.ipynb

You may be right about BGR2RGB, try to do the conversion before this line and see if it is better: https://github.com/OSSDC/OSSDC-VisionAI-Core/blob/master/video_processing_GANsNRoses.py#L116

mslavescu commented 3 years ago

Thanks for sharing, its very cool! Running videos on cpu is definitely very slow. A potential way to speed up might be to batch the operation. In the notebook right now, I am running one image at a time.

Thanks @mchong6! I ran it on GPU with live video stream from Android phone, over WebRTC, and it is pretty fast on Google Colab also.

Is the PIL Image a requirement for the network input? If I could use directly the OpenCV image and skip the conversion steps, would be even faster.

I noticed that it works well when the face fills the image, in my demo video I just used the Android phone back camera and tried to frame the face in your GIF here: https://github.com/mchong6/GANsNRoses/blob/main/teaser.gif till the result was good.

mchong6 commented 3 years ago

The network input is a tensor. How you load does not matter. I loaded it with PIL because it plays well with torchvision transforms function which resizes and transforms it into a tensor.

Yes the network is trained on cropped faces. So it will work best with similar framings.