Open GeekyPM07 opened 6 years ago
I have used this code in a recent project on customer feedback. Here is what I did.
sentiment-extractor.py
in this repo folder. Filled in the following code
import pandas as pd
from encoder import Model
sentiment_model = Model()
data_df = pd.read_csv('/path/to/samples/') # all the feedback text was placed in a pandas data frame
samples = list(data_df['samples'])
text_features = sentiment_model.transform(samples)
sentiment_scores = text_features[:, 2388]
data_df['sentiment_scores'] = sentiment_scores
data_df.to_csv('/path/to/output_dir')
Note that the code runs significantly faster on a GPU.
@pgurazada I'm a total beginner to python and pandas trying to apply this code to my csv. If possible, would love any help whatsoever as to why I'm getting this error.
I replaced '/path/to/samples/' with a file names 'samples.csv in the same directory
Warning (from warnings module):
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 14
warnings.warn(msg, category=DeprecationWarning)
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py", line 15
warnings.warn(msg, category=DeprecationWarning)
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
WARNING:tensorflow:From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'samples'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 23, in <module>
samples = list(data_df['samples'])
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'samples'
@pgurazada I should mention I also don't know how to convert csv to pandas dataframe
@massimosclaw
Firstly, pd.read_csv()
will read the csv file into a pandas data frame. So you will not need any extra work on that. Secondly, it appears to me that your data frame does not have a column named 'samples' and the subsetting step data_df['samples']
fails (hence the KeyError). Kindly check if you have a column named 'samples' in your csv.
I understand where you are coming from. I am sure this link would help you grok pandas better - https://tomaugspurger.github.io/modern-1-intro.html
@pgurazada Thank you so much! And thank you for the resource, will definitely check it out. Now it's giving me this error.
Traceback (most recent call last):
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 10, in <module>
text_features = model.transform(samples)
NameError: name 'model' is not defined
It seems to me this means the (variable?) 'model' hasn't been given a value? I'm not sure what value to give it though...
I saw in someone else's example: https://github.com/ModelDepot/Sentiment-Neuron-Demonstration/blob/master/Sentiment_Neuron.ipynb that they defined model = Model()
, and noticed you defined it as sentiment_model.
So I also tried replacing
text_features = model.transform(samples)
with
text_features = sentiment_model.transform(samples)
And got this:
Traceback (most recent call last):
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/sentiment-extractor.py", line 12, in <module>
text_features = sentiment_model.transform(samples)
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/encoder.py", line 156, in transform
xs = [preprocess(x) for x in xs]
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/encoder.py", line 156, in <listcomp>
xs = [preprocess(x) for x in xs]
File "/Volumes/Transcend/sentimentneuron/generating-reviews-discovering-sentiment/utils.py", line 53, in preprocess
text = html.unescape(text)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/__init__.py", line 130, in unescape
if '&' not in s:
TypeError: argument of type 'float' is not iterable
@massimosclaw Thank you for pointing the error on the sentiment_model out. I now corrected my original comment. Seems to me there is some error in the data types of your columns. The encoder is returning an error on iteration? Would you be able to check if there are any missing values? Also, any data sanity checks on whether the strings are all encoded properly would help.
@pgurazada No prob!
Unfortunately, I don't know how to check if there are any missing values (or what that means exactly... empty cells?) as well as data sanity checks on whether the strings are all encoded properly... don't know what that means either. Anything in particular I should search for to learn more about that?
Will do some googling around with those terms...
Found this... https://towardsdatascience.com/data-cleaning-with-python-and-pandas-detecting-missing-values-3e9c6ebcf78b - will read up on it.
@pgurazada I finally managed to get it to work just by deleting all other columns which contained links, dates, times, and other data. Thank you so much again for the help.
Wanted to ask one last question... how do you get the code to run on your GPU? As it takes a long time to run on my CPU.
@pgurazada I finally managed to get it to work just by deleting all other columns which contained links, dates, times, and other data. Thank you so much again for the help.
Wanted to ask one last question... how do you get the code to run on your GPU? As it takes a long time to run on my CPU.
I am so sorry, missed this out for some reason. On a GPU predictions are about 10x faster. I was using the standard Colaboratory GPUs.
could you kindly tell ,why you have taken ""sentiment_scores = text_features[:, 2388]"",2388 in text features.What is the use of it?
@divyag11 Please check out the original paper from Open AI. This is the activation of the sentiment neuron.
okay,thanks for your reply
I can't find a way how to incorporate this model to my code? I just need to get sentiment scores on some Feedbacks. As this model is pre-trained, it will be of much help. How do i do this? Thanks!