Closed johndpope closed 8 months ago
training code on celebhq-v dataset (not using stable diffusion under the hood) https://github.com/JialeTao/MRFA
Does the 150 million image data used for EMO require images with only human faces? Or do you need images of faces from multiple angles? If not, please let me know. Thank you.
using the 40gb torrent we have lots of mp4s + we have Dataset - we can load the videos + audio - get the frames / audio features.
https://github.com/johndpope/Emote-hack/blob/main/Net.py#L1004
see training stage 1/2/3
Download the required datasets. Write data loaders and preprocessors as per the paper's specifications.
150 million images. 250 hours of video