the style accuracy is much lower

FayeXXX commented 1 year ago

Hi, I've got the testing result after your kindly instruction and thank you again. But the result is weird, here are the results :

Corpus mode: Yelp Pair mode: semantic

Epoch: 0 supervised loss decay: 1.0 Epoch: 0 BLEU Score: 0.2605128969907829 Style Accuracy: 0.819

Epoch: 1 supervised loss decay: 1.0 Epoch: 1 BLEU Score: 0.24486015740498765 Style Accuracy: 0.89

Epoch: 2 supervised loss decay: 0.21599999999999997 Epoch: 2 BLEU Score: 0.6085014884636862 Style Accuracy: 0.1

Epoch: 3 supervised loss decay: 0.1296 Epoch: 3 BLEU Score: 0.6381170286778461 Style Accuracy: 0.034

Epoch: 4 supervised loss decay: 0.07775999999999998 Epoch: 4 BLEU Score: 0.6362689594426708 Style Accuracy: 0.02

Epoch: 5 supervised loss decay: 0.04665599999999999 Epoch: 5 BLEU Score: 0.6410061139194893 Style Accuracy: 0.039

Epoch: 6 supervised loss decay: 0.027993599999999993 Epoch: 6 BLEU Score: 0.6210238010730351 Style Accuracy: 0.026

Epoch: 7 supervised loss decay: 0.016796159999999994 Epoch: 7 BLEU Score: 0.6368500951570953 Style Accuracy: 0.021

Epoch: 8 supervised loss decay: 0.010077695999999997 Epoch: 8 BLEU Score: 0.6366318866369625 Style Accuracy: 0.021

Epoch: 9 supervised loss decay: 0.006046617599999997 Epoch: 9 BLEU Score: 0.6354611202327463 Style Accuracy: 0.029

I have no idea what the problem is. Looking forward to your reply.

seq-to-mind commented 1 year ago

From the BLEU score, it seems that (probably) the output sentence is mostly copied from the input; Thus it resulted in a low style score. You can try the lexical data for training and compare their results first.

FayeXXX commented 1 year ago

I've tried lexical data too. But the result is almost the same. Here is the result:

Corpus mode: Yelp Pair mode: lexical

Epoch: 0 supervised loss decay: 1.0 Epoch: 0 BLEU Score: 0.40078558066773673 Style Accuracy: 0.74

Epoch: 1 supervised loss decay: 1.0 Epoch: 1 BLEU Score: 0.35988087240113614 Style Accuracy: 0.784

Epoch: 2 supervised loss decay: 0.21599999999999997 Epoch: 2 BLEU Score: 0.6427398017342613 Style Accuracy: 0.105

Epoch: 3 supervised loss decay: 0.1296 Epoch: 3 BLEU Score: 0.6485470740921879 Style Accuracy: 0.067

Epoch: 4 supervised loss decay: 0.07775999999999998 Epoch: 4 BLEU Score: 0.6477110423916957 Style Accuracy: 0.069

Epoch: 5 supervised loss decay: 0.04665599999999999 Epoch: 5 BLEU Score: 0.6497092100218357 Style Accuracy: 0.078

Epoch: 6 supervised loss decay: 0.027993599999999993 Epoch: 6 BLEU Score: 0.66097016482029 Style Accuracy: 0.109

Epoch: 7 supervised loss decay: 0.016796159999999994 Epoch: 7 BLEU Score: 0.6469795999288817 Style Accuracy: 0.093

Epoch: 8 supervised loss decay: 0.010077695999999997 Epoch: 8 BLEU Score: 0.6533797490076525 Style Accuracy: 0.091

Epoch: 9 supervised loss decay: 0.006046617599999997 Epoch: 9 BLEU Score: 0.6533700702244236 Style Accuracy: 0.097

FayeXXX commented 1 year ago

You mentioned that “the output sentence is mostly copied from the input”, and actually that occured during my reproduction of other algorithms too. Do you have any idea how to fix it?

seq-to-mind commented 1 year ago

For the RL-based models, as there are two rewards (style strength and content preservation), if the model is copying the input sentence, usually because the content preservation reward / training signal is too strong. 1) You can adjust the style-based learning rate decay to a larger value; For instance, in the 181 line of main.py file,model.supervised_loss_decay = 0.6 ** (train_epoch + 1)==> model.supervised_loss_decay = 0.9 ** (train_epoch + 1) 2) You can also try to using the self-critic policy gradient (comparing the line 262 - line 266 in the Model.py file).

FayeXXX commented 1 year ago

Thank you for your patience and finally it works. I follow the two steps you mentioned above, and during the training process, the accuracy is 0.92, and the bleu score is 0.24. That makes sense.

Thank you again. And you're one of the greatest instructor I've ever met on github!

seq-to-mind commented 1 year ago

You can further try to adjust the aforementioned `supervised_loss_decay' to get the balanced style accuracy and bleu score. Usually you can get higher and balanced scores (as reported in the paper) in some middle steps during each training epoch.

I will close this issue.

seq-to-mind / semi-style-transfer

the style accuracy is much lower #3