Closed morlowsk closed 8 years ago
positive coefficients -> positive sentiment
On Fri, Nov 6, 2015 at 2:43 PM, morlowsk notifications@github.com wrote:
I am having trouble coding out this function because I don't understand what would signify either a positive term in a document and a negative term in a document. I know the clf has its coef_ field that will give me the coefficients for all the terms in our vocabulary, but I don't know whether positive coefficients signify positive sentiment and whether negative coefficients signify negative sentiment.
— Reply to this email directly or view it on GitHub https://github.com/iit-cs579/main/issues/115.
Hmm, well I don't understand why I would get this result.
for document data/test/pos/10055_10.txt, the term most predictive of class 0 is can (index=652)
for document data/test/pos/10055_10.txt, the term most predictive of class 1 is best (index=492)
Try sorting them properly according to class label.
How do you sort terms by class label if they only have real valued coefficients? At the moment, I am just taking the maximum of the coefficients for the terms in the document if it's class_idx = 1, and the minimum otherwise. I don't understand how it would work for one label but not for the other.
ya that's what I meant.
I don't think the term "can" appears in document data/test/pos/10055_10.txt
, so perhaps something is wrong with how you're mapping words to indices?
Perhaps clearing memory and running from scratch will resolve some problems.
Yeah wiping memory clean and running it over again didn't help. Where do we map words to indices again? Things were just fine until this function.
Hi, coefficients in clf.coef_() is naturally corresponding to the terms learned in do_vectorize(). And at this stage each entry is a binary value. Rule out all terms that does not appear in the given document. Hope this can help you.
Hi,
Still I don't understand this most_predictive_term_in_doc() function. Could you please elaborate?
Coefficients of regression model learned by clf is one per word in the whole vocabulary. But in a specific document it just contains a small subset of the total vocabulary. You are going to find the most predictive words for this small subset.
Oh okay. Thank you.
I am having trouble coding out this function because I don't understand what would signify either a positive term in a document and a negative term in a document. I know the clf has its coef_ field that will give me the coefficients for all the terms in our vocabulary, but I don't know whether positive coefficients signify positive sentiment and whether negative coefficients signify negative sentiment.