Open cristiano-belloni opened 5 years ago
You could possibly add something like this to your code:
let negativePhrases = ['refund', 'drop in revenue']
let positivePhrases = ['high-end', 'new product']
export const analyzeSentiment = (text) => {
let sentiment = new Sentiment();
let result = sentiment.analyze(text);
[...negativePhrases, ...positivePhrases].forEach((phrase, index) => {
if(text?.toLowerCase().includes(phrase?.toLowerCase()) && result.words.indexOf(phrase?.toLowerCase()) === -1){
let obj = {}
if(index < negativePhrases.length){
obj[phrase] = -3
}else{
obj[phrase] = 3
}
result.calculation.push(obj)
}
})
let values = [];
result.calculation.forEach((obj) => {
values.push(Object.values(obj)?.[0])
})
result.comparative = average(values);
return result;
}
export const average = arr => {
if(arr?.length === 0 || arr === undefined){
return 0
}
return arr.reduce((p, c) => p + c, 0) / arr.length
};
@cristiano-belloni The issue you're encountering arises because the sentiment library's default tokenizer does not recognize multi-word expressions (like "made up" or "fucking good") out of the box. The library processes the text word by word, so multi-word phrases in the extras dictionary aren't being matched correctly. To handle multi-word expressions, you need to preprocess the text to identify and replace multi-word phrases with a single token before passing it to the sentiment analyzer.
possible solution:-
sample code snippet in js, which might help you:-
const Sentiment = require('sentiment');
const sentiment = new Sentiment();
function preprocessText(text, multiWordPhrases) {
// Replace multi-word phrases with single tokens
for (let phrase in multiWordPhrases) {
const token = phrase.replace(/\s+/g, '_');
const regex = new RegExp(phrase, 'gi');
text = text.replace(regex, token);
}
return text;
}
function analyzeSentiment(text, extras) {
const multiWordPhrases = extras;
const preprocessedText = preprocessText(text, multiWordPhrases);
// Adjust the extras object to match the preprocessed tokens
const adjustedExtras = {};
for (let phrase in multiWordPhrases) {
const token = phrase.replace(/\s+/g, '_');
adjustedExtras[token] = multiWordPhrases[phrase];
}
// Analyze the sentiment of the preprocessed text
return sentiment.analyze(preprocessedText, { extras: adjustedExtras });
}
// Example usage
const text1 = 'This stuff is made up';
const extras1 = { 'made up': -1 };
console.log(analyzeSentiment(text1, extras1));
const text2 = 'This stuff is fucking good';
const extras2 = { 'fucking good': 3 };
console.log(analyzeSentiment(text2, extras2));
hope this helps, Thanks
Hello, I'm trying to override the AFINN scores for 2-grams, but it doesn't seem to work:
The effect is even more accentuated when a 2-gram would flip the overall score of a phrase; here "fucking good" reinforces a positive word, but the overall score is -1:
Would it be possible and a good idea to add support for overridden 2-grams?