ruidan / Unsupervised-Aspect-Extraction

Code for acl2017 paper "An unsupervised neural attention model for aspect extraction"
Apache License 2.0
338 stars 117 forks source link

Questions about the results #11

Open RJzz opened 6 years ago

RJzz commented 6 years ago

Hello, I try to evaluate the uploaded trained restaurant model by running evaluation.py directly, did not make any changes to the code, but did not get the same results as the paper, is it necessary to tune some parameters? Thank you for your answer. That's the result I got. --- Results on restaurant domain --- precision recall f1-score support

     Food      0.654     0.493     0.562       887
    Staff      0.406     0.270     0.324       352
 Ambience      0.245     0.143     0.181       251
Anecdotes      0.000     0.000     0.000         0
    Price      0.000     0.000     0.000         0

Miscellaneous 0.000 0.000 0.000 0

avg / total 0.527 0.381 0.442 1490

ruidan commented 6 years ago

All the parameters are already set correctly as default and there is no need to tune. If you follow the evaluation instruction in README, the results should be exactly as follows (I just checked again by downloading the current code):

screen shot 2018-08-10 at 11 40 41 pm

Did you uncomment line 28 in evaluation.py to set the model to be evaluated as the pre-trained model ?

RJzz commented 6 years ago

Yes,i uncomment line 28 in evaluation.py to set the model to be evaluated as the pre-trained model,and i just change the import lib in model.py as follow: import logging import os os.environ['KERAS_BACKEND']='theano' import keras.backend as K K.set_image_dim_ordering('th') import importlib importlib.reload(K) from keras.layers import Dense, Activation, Embedding, Input from keras.models import Model from my_layers import Attention, Average, WeightedSum, WeightedAspectEmb, MaxMargin

Shouldn't I be doing this?This is the result of my downloading and running again. precision recall f1-score support

     Food      0.855     0.729     0.787       887
    Staff      0.792     0.636     0.706       352
 Ambience      0.781     0.470     0.587       251
Anecdotes      0.000     0.000     0.000         0
    Price      0.000     0.000     0.000         0

Miscellaneous 0.000 0.000 0.000 0

avg / total 0.827 0.664 0.734 1490

thanks a lot!

ruidan commented 6 years ago

1) Seems you are using Python 3. The code is tested under Python 2.7. Not sure whether this will affect the results. Also check the versions of other dependencies.

2) Did you download the preprocessed datasets directly or you preprocessed the original datasets again using the script provided? You should use the preprocessed dataset for the saved model. If you preprocess the dataset again, the word indexes may not exactly match the saved word embeddings as I don't really remember whether I changed something in the preprocess script when cleaning up the code.

RJzz commented 6 years ago

thanks!i try to use Python 2.7, and get a similar result

ilivans commented 6 years ago

Hi @ruidan, Great job! I wonder why are the results above different from the reported in the article? image Is it random seed or something else? Can we reproduce the same results as presented in the article?

ThiagoSousa commented 6 years ago

I also tried to replicate the results and I got the same as @ruidan. It is a close result to the paper, but not exactly. Was it a different training set or parameters used in the paper?

screen shot 2018-11-06 at 2 07 18 pm

EDIT: I checked the default parameters in the code and they are pretty much the same as the papers. The paper mentions that the reported results are an average of 10 executions; therefore @ilivans, it might explain the different results.

EDIT2: disregard my other questions, I found the answer in the paper.

ilivans commented 6 years ago

@ThiagoSousa thank you! Shame on me for not noticing that detail.