Open jvenepal opened 4 years ago
Hello Jake,
In chapter 05.05-Naive-Bayes, there is a set of below shown commands
train = fetch_20newsgroups(subset='train', categories=categories) test = fetch_20newsgroups(subset='test', categories=categories) model = make_pipeline(TfidfVectorizer(), MultinomialNB()) model.fit(train.data, train.target) labels = model.predict(test.data)
They work fine. But when I try to split them into individual commands, I am running into errors with model. predict()
from sklearn.feature_extraction.text import TfidfVectorizer vec = TfidfVectorizer() trainData = vec.fit_transform(train.data) testData = vec.fit_transform(test.data) from sklearn.naive_bayes import MultinomialNB model = MultinomialNB() model.fit(trainData, train.target) testLables = model.predict(testData)
model.predict(testData) errors out. The error is: ValueError: dimension mismatch
Do you know what I am doing wrong? Fyi, here is the lengths of my train/test data
In[12]: print(len(train.data), len(train.target), len(test.data), len(test.target)) Out[12]: 4528 4528 3012 3012
If the mismatch of lengths of train.data and test.data is the cause, I am wondering, why make_pipeline didn't run into the same problem.
Hello Jake,
In chapter 05.05-Naive-Bayes, there is a set of below shown commands
They work fine. But when I try to split them into individual commands, I am running into errors with model. predict()
model.predict(testData) errors out. The error is:
ValueError: dimension mismatch
Do you know what I am doing wrong? Fyi, here is the lengths of my train/test data
If the mismatch of lengths of train.data and test.data is the cause, I am wondering, why make_pipeline didn't run into the same problem.