Open RJzz opened 6 years ago
All the parameters are already set correctly as default and there is no need to tune. If you follow the evaluation instruction in README, the results should be exactly as follows (I just checked again by downloading the current code):
Did you uncomment line 28 in evaluation.py to set the model to be evaluated as the pre-trained model ?
Yes,i uncomment line 28 in evaluation.py to set the model to be evaluated as the pre-trained model,and i just change the import lib in model.py as follow: import logging import os os.environ['KERAS_BACKEND']='theano' import keras.backend as K K.set_image_dim_ordering('th') import importlib importlib.reload(K) from keras.layers import Dense, Activation, Embedding, Input from keras.models import Model from my_layers import Attention, Average, WeightedSum, WeightedAspectEmb, MaxMargin
Shouldn't I be doing this?This is the result of my downloading and running again. precision recall f1-score support
Food 0.855 0.729 0.787 887
Staff 0.792 0.636 0.706 352
Ambience 0.781 0.470 0.587 251
Anecdotes 0.000 0.000 0.000 0
Price 0.000 0.000 0.000 0
Miscellaneous 0.000 0.000 0.000 0
avg / total 0.827 0.664 0.734 1490
thanks a lot!
1) Seems you are using Python 3. The code is tested under Python 2.7. Not sure whether this will affect the results. Also check the versions of other dependencies.
2) Did you download the preprocessed datasets directly or you preprocessed the original datasets again using the script provided? You should use the preprocessed dataset for the saved model. If you preprocess the dataset again, the word indexes may not exactly match the saved word embeddings as I don't really remember whether I changed something in the preprocess script when cleaning up the code.
thanks!i try to use Python 2.7, and get a similar result
Hi @ruidan, Great job! I wonder why are the results above different from the reported in the article? Is it random seed or something else? Can we reproduce the same results as presented in the article?
I also tried to replicate the results and I got the same as @ruidan. It is a close result to the paper, but not exactly. Was it a different training set or parameters used in the paper?
EDIT: I checked the default parameters in the code and they are pretty much the same as the papers. The paper mentions that the reported results are an average of 10 executions; therefore @ilivans, it might explain the different results.
EDIT2: disregard my other questions, I found the answer in the paper.
@ThiagoSousa thank you! Shame on me for not noticing that detail.
Hello, I try to evaluate the uploaded trained restaurant model by running evaluation.py directly, did not make any changes to the code, but did not get the same results as the paper, is it necessary to tune some parameters? Thank you for your answer. That's the result I got. --- Results on restaurant domain --- precision recall f1-score support
Miscellaneous 0.000 0.000 0.000 0
avg / total 0.527 0.381 0.442 1490