Open mslavescu opened 3 years ago
the demo colors seem to be off, might need to convert bgr to rgb and I also tried to run video, its take to long on cpu were you able to get this working fast on cpu for video?
Thanks for sharing, its very cool! Running videos on cpu is definitely very slow. A potential way to speed up might be to batch the operation. In the notebook right now, I am running one image at a time.
the demo colors seem to be off, might need to convert bgr to rgb and I also tried to run video, its take to long on cpu were you able to get this working fast on cpu for video?
@AK391 this demo I ran on Nvidia 3080 Mobile, getting about 50 FPS on 240x320 image.
You can reproduce my demo very easy using this Google Colab notebook, just run the setup steps up to MediaPipe section then jump to GANsNRoses section and run those steps: https://colab.research.google.com/github/OSSDC/OSSDC-VisionAI-Core/blob/master/OSSDC_VisionAI_demo_reel.ipynb
You may be right about BGR2RGB, try to do the conversion before this line and see if it is better: https://github.com/OSSDC/OSSDC-VisionAI-Core/blob/master/video_processing_GANsNRoses.py#L116
Thanks for sharing, its very cool! Running videos on cpu is definitely very slow. A potential way to speed up might be to batch the operation. In the notebook right now, I am running one image at a time.
Thanks @mchong6! I ran it on GPU with live video stream from Android phone, over WebRTC, and it is pretty fast on Google Colab also.
Is the PIL Image a requirement for the network input? If I could use directly the OpenCV image and skip the conversion steps, would be even faster.
I noticed that it works well when the face fills the image, in my demo video I just used the Android phone back camera and tried to frame the face in your GIF here: https://github.com/mchong6/GANsNRoses/blob/main/teaser.gif till the result was good.
The network input is a tensor. How you load does not matter. I loaded it with PIL because it plays well with torchvision transforms function which resizes and transforms it into a tensor.
Yes the network is trained on cropped faces. So it will work best with similar framings.
@mchong6 thanks for open sourcing this project! It is really fun!
See here a realtime demo using your implementation:
Have fun with GANsNRoses - using OSSDC VisionAI realtime video processing platform https://www.youtube.com/watch?v=YZTzjk_qh4w
More details in video description. It takes less than 5 min to run it in Google Colab with realtime video streamed from any Android 4.2.2 phone/tablet/media player camera.