tensorflow / skflow

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
Apache License 2.0
3.18k stars 439 forks source link

saved model returns different prediction results #135

Closed GameOfThrow closed 8 years ago

GameOfThrow commented 8 years ago

I'm new to skflow, so this might be me being stupid; I've trained a RNN using the example data, and saved it using classifier.save(model_path).

I also dumped out the prediction results using: pandas.DataFrame(classifier.predict(X_test)).to_csv

This all works fine and I have an accuracy of roughly 80%

Next I load in the existing model using classifier = skflow.TensorFlowEstimator.restore(model_path) and also the same testing file.

I passed the same test file through the VocabularyProcessor and generating the np Array: X_test = np.array(list(vocab_processor.transform(X_test)))

I then run the prediction again: classifier.predict(X_test) but the accuracy is now only around 35%

but the result is quite a lot different from the result I got when training the model. Any one can help me with what's going on here?

EDIT--

After exploring, I found out it is the VocabularyProcessor - when I rerun my data, the vocabs are re-labelled from 1 to N instead of keeping the same vector labels (when I first ran the model). Is there a way I would correctly label my vocabularies when reloading a model file?

ilblackdragon commented 8 years ago

You should save vocab_processor with model as well. I'm going to add explicit save/restore methods per #130 but you can just pickle it and unpickle when running at inference time.

GameOfThrow commented 8 years ago

pickle works well - thanks for the response.