I was using the sentiment library and noticed when I ran analysis on headlines that utilized single quotes, the words were not being properly tokenized.
For example, for the news headline from cnn.com that reads:
Abrams: Trump is 'wrong,' I am qualified to be Georgia's governor
wrong should be tokenized from 'wrong' to wrong.
In its current state, the library successfully tokenizes words from double quotes but not from single quotes (my guess is to preserve apostrophes - if you add an \' to the .replace regex, all single quotes would be removed).
Here is some code to reproduce error:
var Sentiment = require('sentiment');
var sentiment = new Sentiment();
let noQuotes = "Abrams: Trump is wrong, I am qualified to be Georgia's governor";
let singleQuotes = "Abrams: Trump is \'wrong\', I am qualified to be Georgia's governor";
let doubleQuotes = "Abrams: Trump is \"wrong,\" I am qualified to be Georgia's governor"
let noQuotesResult = sentiment.analyze(noQuotes);
var doubleQuotesResult = sentiment.analyze(doubleQuotes);
var singleQuotesResult = sentiment.analyze(singleQuotes);
console.log(noQuotesResult);
console.log(doubleQuotesResult);
console.log(singleQuotesResult);
I was using the sentiment library and noticed when I ran analysis on headlines that utilized single quotes, the words were not being properly tokenized.
For example, for the news headline from cnn.com that reads:
wrong should be tokenized from 'wrong' to wrong.
In its current state, the library successfully tokenizes words from double quotes but not from single quotes (my guess is to preserve apostrophes - if you add an \' to the .replace regex, all single quotes would be removed).
Here is some code to reproduce error: