StyleNet is a novel framework to address the task of generating attractive captions for images and videos with different styles. A novel model component, named factored LSTM is used in StyleNet, which automatically distills the style factors in the monolingual text corpus.
framework
examples of generated captions