Closed aviv closed 6 years ago
Thanks @aviv!
We could change this line:
https://github.com/wooorm/syllable/blob/d7ef1ec90892f74f25eb5f819359a4f58550b3d5/index.js#L306
...to something like this:
var values = contractions(normalize(String(value)).toLowerCase()).split(SPLIT);
Where contractions
would be something like:
var SMART_APOSTROPHE = /’/g;
var STRAIGHT_APOSTROPHE = /'/g;
var CONTRACTIONS = /that's|they're|aren't|etc/g;
function contractions(value) {
value.replace(SMART_APOSTROPHE, '\'').replace(CONTRACTIONS, replacer);
function replacer($0) {
return $0.replace(STRAIGHT_APOSTROPHE, '');
}
}
That should do the trick I think? Would you like to work on this?
Thanks for the quick response @wooorm! That seems like a great approach for most cases. I haven't checked every contraction, but I am noticing that "aren't" -> 3 and "arent" -> 2, even though "aren't" apparently only has one syllable (cf. https://www.howmanysyllables.com/words/aren't). Perhaps that's a different edge case though.
I'd definitely be interested in taking a stab at a PR! But I'm also wondering if there'd be a way to accomplish this without explicitly listing out every possible contraction. Maybe we can just look for an apostrophe inside of a word? Or even just remove all apostrophes from the text being analyzed? I'm not sure if that would have any unintended side-effects, but intuitively it seems like it wouldn't, and that might be a simpler approach. Either of those solutions would have the advantage of handling possessive contractions, which are also currently getting an extra syllable (e.g. "brother's" -> 3, "brothers" -> 2) and which could be used in too many words to explicitly list.
I am noticing that "aren't" -> 3 and "arent" -> 2, even though "aren't" apparently only has one syllable.
Hmm, I dunno. I’m not a native English speaker but I’d say you argue about it. Anyway, you could add those here:
https://github.com/wooorm/syllable/blob/d7ef1ec90892f74f25eb5f819359a4f58550b3d5/index.js#L11-L15
But I'm also wondering if there'd be a way to accomplish this without explicitly listing out every possible contraction. Maybe we can just look for an apostrophe inside of a word? Or even just remove all apostrophes from the text being analyzed? I'm not sure if that would have any unintended side-effects, but intuitively it seems like it wouldn't, and that might be a simpler approach. Either of those solutions would have the advantage of handling possessive contractions, which are also currently getting an extra syllable (e.g. "brother's" -> 3, "brothers" -> 2) and which could be used in too many words to explicitly list.
Yeah, maybe. I can’t think of any downsides. If you can’t come up with anything either, let’s do that!
@aviv Ping! 🔔
I would say "aren't" is only one syllable.
@ibanner56 why? Wouldn't it be pronounced are-ent?
I think the common pronunciation is just "arnt". It might be a localization thing, but the first results I found when I searched for an answer said it was only 1.
Actually, it feels more like a context thing when I think about my own usage. It depends on the cadence of the sentence more than anything else.
Thinking about it, I'm actually more partial to 2, rather than 1.
Hi! We noticed that contractions aren't handled properly. A few examples:
that's -> 2 (should be 1) they're -> 2 (should be 1) aren't -> 3 (should be 1)
Let me know if you have any thoughts on how to account for contractions!
Thanks, Aviv (Engineering Lead at Flocabulary)