vineetjohn / linguistic-style-transfer

Neural network parametrized objective to disentangle and transfer style and content in text
Apache License 2.0
138 stars 33 forks source link

non-reproducible results (yelp) #62

Closed Ulitochka closed 6 years ago

Ulitochka commented 6 years ago

Hello. Thank you for your work.

I tried to reproduce the results on the Yelp data. I used the latest version of the code. I didn't change the model parameters. Data from: https://github.com/lijuncen/Sentiment-and-Style-Transfer/find/master I train w2v models. I train style classifier (its quality I reproduced).

But I have strange results:

{"word-overlap": 0.24729206951072533, "epoch": 26, "style-transfer": 0.4400895856662934, "content-preservation": 0.978698748140737}

I use DEBUG mode and I have this output: `09-05T18:16:44: validating label 0 2018-09-05 18:16:46.028885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-09-05 18:16:46.028951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-05 18:16:46.028975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-09-05 18:16:46.028996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-09-05 18:16:46.029157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5143 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 09-05T18:16:46: style_transfer_score: 0.0 09-05T18:16:46: confusion_matrix: [[1819 181] [ 0 0]] 09-05T18:16:46: Skipped lines: [] :-: [] 09-05T18:16:46: Skipped lines: ['well', 'done'] :-: ['nope'] 09-05T18:16:46: Skipped lines: ['foods', 'great'] :-: ['strike', 'num'] 09-05T18:16:46: Skipped lines: ['great', 'wings'] :-: ['num'] 09-05T18:16:46: Skipped lines: ['great'] :-: ['num'] 09-05T18:16:46: Skipped lines: [] :-: [] 09-05T18:16:46: Skipped lines: ['well', 'done'] :-: ['nope'] 09-05T18:16:46: Skipped lines: ['great', 'subs'] :-: ['num'] 09-05T18:16:46: 8 lines skipped due to errors 09-05T18:16:46: content_preservation_score: 0.9862392492341229 09-05T18:16:46: word_overlap_score: 0.27697874119639415

09-05T18:16:46: validating label 1 2018-09-05 18:16:48.106678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-09-05 18:16:48.106732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-05 18:16:48.106755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-09-05 18:16:48.106770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-09-05 18:16:48.106938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5143 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 09-05T18:16:48: style_transfer_score: 0.9301952393688153 09-05T18:16:48: confusion_matrix: [[ 0 0] [ 261 1739]] 09-05T18:16:48: Skipped lines: [] :-: ['do', 'yourself', 'a', 'favor', 'and', 'work', 'here'] 09-05T18:16:48: Skipped lines: ['i', 'enjoyed', 'their', 'pollo', 'bowl'] :-: [] 09-05T18:16:48: Skipped lines: ['great', 'place', 'to', 'take', 'a', 'family', 'or', 'business', 'partners'] :-: [] 09-05T18:16:48: Skipped lines: [] :-: ['my', 'favorite', 'place', 'to', 'have', 'a', 'great', 'choice'] 09-05T18:16:48: Skipped lines: ['some', 'nice', 'coins', 'priced', 'at', 'market', 'values'] :-: [] 09-05T18:16:48: 5 lines skipped due to errors 09-05T18:16:48: content_preservation_score: 0.9648154781145771 09-05T18:16:48: word_overlap_score: 0.016776839826839827 09-05T18:16:48: Aggregate Style Transfer: 0.46509761968440766 09-05T18:16:48: Aggregate Content Preservation: 0.97552736367435 09-05T18:16:48: Aggregate Word Overlap: 0.14687779051161698 `

Could you help me, please?

vineetjohn commented 6 years ago

My apologies for the delayed response.

I've used the Yelp data splits as shared in the below repository, and reported scores on the test set https://github.com/shentianxiao/language-style-transfer

If you're using different Yelp data splits, the tuning of hyperparameters, and the number of epochs to run would depend on your dev accuracy of that particular dataset split.

I hope that helps.