vineetjohn / linguistic-style-transfer

Neural network parametrized objective to disentangle and transfer style and content in text
Apache License 2.0
138 stars 33 forks source link

report a big PPL score on yelp #71

Open hejunqing opened 4 years ago

hejunqing commented 4 years ago

Hello. Thanks for sharing your work. I trained a model following the steps in README and ran the evaluation using the run_all_evaluator.sh It turns out most of the metrics are identicle to the results reported in your paper except PPL. The results for my trained model are : ll_scores: [(-9.701861720617387, 106.5074394250216), (-10.269295644873736, 120.9065905583248)] The mean PPL is 113.7 However, the results should be around 32. I think it may attribute to a different vocabulary or training KenLM with different corpus. I directly used the yelp_corpus_adapter for data preparation and yelp/reviews-train.txt to train KenLM. Did I miss something ?

vrublack commented 4 years ago

I have the same issue. I tried training the language model on the dev and test split as well but got a similar PPL. Notably the overall_evaluator.py script should be changed in line 62 to ll_score, ppl_score = language_fluency.score_generated_sentences(generated_text_file_path, options.language_model_path) and in line 68 to ll_scores.append(ppl_score) because it formerly wanted to output a tuple of negative log likelihood and perplexity (might have something to do with Kenlm versions).