openai / generating-reviews-discovering-sentiment

Code for "Learning to Generate Reviews and Discovering Sentiment"
https://arxiv.org/abs/1704.01444
MIT License
1.51k stars 380 forks source link

How to 'use' this model in our own project ? #56

Open GeekyPM07 opened 6 years ago

GeekyPM07 commented 6 years ago

I can't find a way how to incorporate this model to my code? I just need to get sentiment scores on some Feedbacks. As this model is pre-trained, it will be of much help. How do i do this? Thanks!

pgurazada commented 5 years ago

I have used this code in a recent project on customer feedback. Here is what I did.

sentiment_model = Model()

data_df = pd.read_csv('/path/to/samples/') # all the feedback text was placed in a pandas data frame

samples = list(data_df['samples'])

text_features = sentiment_model.transform(samples)

sentiment_scores = text_features[:, 2388]

data_df['sentiment_scores'] = sentiment_scores

data_df.to_csv('/path/to/output_dir')



Note that the code runs significantly faster on a GPU.
massimosclaw commented 5 years ago

@pgurazada I'm a total beginner to python and pandas trying to apply this code to my csv. If possible, would love any help whatsoever as to why I'm getting this error.

I replaced '/path/to/samples/' with a file names 'samples.csv in the same directory

Warning (from warnings module):
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 14
    warnings.warn(msg, category=DeprecationWarning)
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.

Warning (from warnings module):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py", line 15
    warnings.warn(msg, category=DeprecationWarning)
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'samples'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 23, in <module>
    samples = list(data_df['samples'])
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'samples' 
massimosclaw commented 5 years ago

@pgurazada I should mention I also don't know how to convert csv to pandas dataframe

pgurazada commented 5 years ago

@massimosclaw

Firstly, pd.read_csv() will read the csv file into a pandas data frame. So you will not need any extra work on that. Secondly, it appears to me that your data frame does not have a column named 'samples' and the subsetting step data_df['samples'] fails (hence the KeyError). Kindly check if you have a column named 'samples' in your csv.

I understand where you are coming from. I am sure this link would help you grok pandas better - https://tomaugspurger.github.io/modern-1-intro.html

massimosclaw commented 5 years ago

@pgurazada Thank you so much! And thank you for the resource, will definitely check it out. Now it's giving me this error.

Traceback (most recent call last):
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 10, in <module>
    text_features = model.transform(samples)
NameError: name 'model' is not defined

It seems to me this means the (variable?) 'model' hasn't been given a value? I'm not sure what value to give it though...

massimosclaw commented 5 years ago

I saw in someone else's example: https://github.com/ModelDepot/Sentiment-Neuron-Demonstration/blob/master/Sentiment_Neuron.ipynb that they defined model = Model(), and noticed you defined it as sentiment_model.

So I also tried replacing

text_features = model.transform(samples) with

text_features = sentiment_model.transform(samples) And got this:

Traceback (most recent call last):
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 12, in <module>
    text_features = sentiment_model.transform(samples)
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/encoder.py", line 156, in transform
    xs = [preprocess(x) for x in xs]
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/encoder.py", line 156, in <listcomp>
    xs = [preprocess(x) for x in xs]
  File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/utils.py", line 53, in preprocess
    text = html.unescape(text)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/__init__.py", line 130, in unescape
    if '&' not in s:
TypeError: argument of type 'float' is not iterable
pgurazada commented 5 years ago

@massimosclaw Thank you for pointing the error on the sentiment_model out. I now corrected my original comment. Seems to me there is some error in the data types of your columns. The encoder is returning an error on iteration? Would you be able to check if there are any missing values? Also, any data sanity checks on whether the strings are all encoded properly would help.

massimosclaw commented 5 years ago

@pgurazada No prob!

Unfortunately, I don't know how to check if there are any missing values (or what that means exactly... empty cells?) as well as data sanity checks on whether the strings are all encoded properly... don't know what that means either. Anything in particular I should search for to learn more about that?

Will do some googling around with those terms...

massimosclaw commented 5 years ago

Found this... https://towardsdatascience.com/data-cleaning-with-python-and-pandas-detecting-missing-values-3e9c6ebcf78b - will read up on it.

massimosclaw commented 5 years ago

@pgurazada I finally managed to get it to work just by deleting all other columns which contained links, dates, times, and other data. Thank you so much again for the help.

Wanted to ask one last question... how do you get the code to run on your GPU? As it takes a long time to run on my CPU.

pgurazada commented 5 years ago

@pgurazada I finally managed to get it to work just by deleting all other columns which contained links, dates, times, and other data. Thank you so much again for the help.

Wanted to ask one last question... how do you get the code to run on your GPU? As it takes a long time to run on my CPU.

I am so sorry, missed this out for some reason. On a GPU predictions are about 10x faster. I was using the standard Colaboratory GPUs.

divyag11 commented 5 years ago

could you kindly tell ,why you have taken ""sentiment_scores = text_features[:, 2388]"",2388 in text features.What is the use of it?

pgurazada commented 5 years ago

@divyag11 Please check out the original paper from Open AI. This is the activation of the sentiment neuron.

divyag11 commented 5 years ago

okay,thanks for your reply