Support for 2-grams - Githubissues

cristiano-belloni commented 5 years ago

Hello, I'm trying to override the AFINN scores for 2-grams, but it doesn't seem to work:

sentiment.analyze( 'This stuff is made up', { extras: { 'made up': -1 } } )

{ score: 0,
  comparative: 0,
  tokens: [ 'this', 'is', 'made', 'up' ],
  words: [],
  positive: [],
  negative: [] }

The effect is even more accentuated when a 2-gram would flip the overall score of a phrase; here "fucking good" reinforces a positive word, but the overall score is -1:

sentiment.analyze( 'This stuff is fucking good', { extras: { 'fucking good': 3 } } )
{ score: -1,
  comparative: -0.2,
  tokens: [ 'this', 'stuff', 'is', 'fucking', 'good' ],
  words: [ 'good', 'fucking' ],
  positive: [ 'good' ],
  negative: [ 'fucking' ] }
>

Would it be possible and a good idea to add support for overridden 2-grams?

martin-richter-uk commented 1 year ago

You could possibly add something like this to your code:


let negativePhrases = ['refund', 'drop in revenue']
let positivePhrases = ['high-end', 'new product']

export const analyzeSentiment = (text) => {

    let sentiment = new Sentiment();
    let result = sentiment.analyze(text);

    [...negativePhrases, ...positivePhrases].forEach((phrase, index) => {
        if(text?.toLowerCase().includes(phrase?.toLowerCase()) && result.words.indexOf(phrase?.toLowerCase()) === -1){
            let obj = {}
            if(index < negativePhrases.length){
                obj[phrase] = -3
            }else{
                obj[phrase] = 3
            }
            result.calculation.push(obj)
        }
    })

    let values = [];
    result.calculation.forEach((obj) => {
        values.push(Object.values(obj)?.[0])
    })

    result.comparative = average(values);
    return result;
}

export const average = arr => {
    if(arr?.length === 0 || arr === undefined){
        return 0
    }
    return arr.reduce((p, c) => p + c, 0) / arr.length
};

Siddharth-Latthe-07 commented 1 month ago

@cristiano-belloni The issue you're encountering arises because the sentiment library's default tokenizer does not recognize multi-word expressions (like "made up" or "fucking good") out of the box. The library processes the text word by word, so multi-word phrases in the extras dictionary aren't being matched correctly. To handle multi-word expressions, you need to preprocess the text to identify and replace multi-word phrases with a single token before passing it to the sentiment analyzer.

possible solution:-

Preprocessing Text for Multi-word Expressions 1.a Preprocess the Text: Replace multi-word phrases with single tokens before analyzing the sentiment. 1.b Analyze Sentiment: Pass the preprocessed text to the sentiment analyzer.

sample code snippet in js, which might help you:-

const Sentiment = require('sentiment');
const sentiment = new Sentiment();

function preprocessText(text, multiWordPhrases) {
    // Replace multi-word phrases with single tokens
    for (let phrase in multiWordPhrases) {
        const token = phrase.replace(/\s+/g, '_');
        const regex = new RegExp(phrase, 'gi');
        text = text.replace(regex, token);
    }
    return text;
}

function analyzeSentiment(text, extras) {
    const multiWordPhrases = extras;
    const preprocessedText = preprocessText(text, multiWordPhrases);

    // Adjust the extras object to match the preprocessed tokens
    const adjustedExtras = {};
    for (let phrase in multiWordPhrases) {
        const token = phrase.replace(/\s+/g, '_');
        adjustedExtras[token] = multiWordPhrases[phrase];
    }

    // Analyze the sentiment of the preprocessed text
    return sentiment.analyze(preprocessedText, { extras: adjustedExtras });
}

// Example usage
const text1 = 'This stuff is made up';
const extras1 = { 'made up': -1 };
console.log(analyzeSentiment(text1, extras1));

const text2 = 'This stuff is fucking good';
const extras2 = { 'fucking good': 3 };
console.log(analyzeSentiment(text2, extras2));

hope this helps, Thanks

thisandagain / sentiment

Support for 2-grams #158