title

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach

notes

先用预训练的self-attention踢出句子中的情感词，然后使用两个rnn，一个rnn负责恢复正向情感的句子，一个负责恢复负向情感的句子。训练目标是使得恢复的句子和原文尽可能相似。由于使用预训练的self-attention踢出句子中的情感词需要离散化，所以本文使用了RL去优化模型参数。reward分为两块，一部分是和原文的相似度，另一部分是情感分类器对这个句子情感的置信度。

bibtex

link

https://arxiv.org/abs/1805.05181

publication

ACL 2018 long accept

open source

https://github.com/lancopku/unpaired-sentiment-translation

affiliated

Peking University

gsh199449 / read-paper

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach #51