kavgan / nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
http://kavita-ganesan.com/kavitas-tutorials/#.WvIizNMvyog
1.14k stars 785 forks source link

How to automate for automated prediction. #11

Open djjeremiahj opened 1 year ago

djjeremiahj commented 1 year ago

I've connected the model to my data. I get a 95% accuracy rate!. It works perfectly. Now, I'm trying to use the model to iterate through the entire dataset and return the result. I'm trying to output the predictions for all 63K items.

I've tried simple for loop:

for each in df['short_description'].head(5):

test_features=transformer.transform(each)
get_top_k_predictions(model,each,2)

this returns: ValueError: Iterable over raw text documents expected, string object received.

my intention is to use this as a 2nd method of prediction to verify the results of the structured programming that I've done. And as time passes, eventually, it will be the primary method.

There are 63K records in the file (and growing).

Any help would be greatly appreciated.