kent0304 / Text2ImageRetrieval

Extract matching images from the input text in the dataset by using the triplet margin loss function.
2 stars 1 forks source link

Survey #4

Open kent0304 opened 3 years ago

kent0304 commented 3 years ago

Video Generation From Text arxiv

Tackle this task by training a conditional generative model to extract both static and dynamic information from text. In a hybrid framework, employing a Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN)

kent0304 commented 3 years ago

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning arxiv

kent0304 commented 3 years ago

Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss pdf

kent0304 commented 3 years ago

Scripted Video Generation With a Bottom-Up Generative Adversarial Network IEEE Xplore