Open kent0304 opened 3 years ago
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning arxiv
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss pdf
Scripted Video Generation With a Bottom-Up Generative Adversarial Network IEEE Xplore
Video Generation From Text arxiv
Tackle this task by training a conditional generative model to extract both static and dynamic information from text. In a hybrid framework, employing a Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN)