words / syllable

Count syllables in an English word
https://words.github.io/syllable/
MIT License
223 stars 23 forks source link

Contractions are not counted correctly #18

Closed aviv closed 6 years ago

aviv commented 7 years ago

Hi! We noticed that contractions aren't handled properly. A few examples:

that's -> 2 (should be 1) they're -> 2 (should be 1) aren't -> 3 (should be 1)

Let me know if you have any thoughts on how to account for contractions!

Thanks, Aviv (Engineering Lead at Flocabulary)

wooorm commented 7 years ago

Thanks @aviv!

We could change this line:

https://github.com/wooorm/syllable/blob/d7ef1ec90892f74f25eb5f819359a4f58550b3d5/index.js#L306

...to something like this:

  var values = contractions(normalize(String(value)).toLowerCase()).split(SPLIT);

Where contractions would be something like:

var SMART_APOSTROPHE =  /’/g;
var STRAIGHT_APOSTROPHE = /'/g;
var CONTRACTIONS = /that's|they're|aren't|etc/g;

function contractions(value) {
  value.replace(SMART_APOSTROPHE, '\'').replace(CONTRACTIONS, replacer);
  function replacer($0) {
    return $0.replace(STRAIGHT_APOSTROPHE, '');
  }
}

That should do the trick I think? Would you like to work on this?

aviv commented 7 years ago

Thanks for the quick response @wooorm! That seems like a great approach for most cases. I haven't checked every contraction, but I am noticing that "aren't" -> 3 and "arent" -> 2, even though "aren't" apparently only has one syllable (cf. https://www.howmanysyllables.com/words/aren't). Perhaps that's a different edge case though.

I'd definitely be interested in taking a stab at a PR! But I'm also wondering if there'd be a way to accomplish this without explicitly listing out every possible contraction. Maybe we can just look for an apostrophe inside of a word? Or even just remove all apostrophes from the text being analyzed? I'm not sure if that would have any unintended side-effects, but intuitively it seems like it wouldn't, and that might be a simpler approach. Either of those solutions would have the advantage of handling possessive contractions, which are also currently getting an extra syllable (e.g. "brother's" -> 3, "brothers" -> 2) and which could be used in too many words to explicitly list.

wooorm commented 7 years ago

I am noticing that "aren't" -> 3 and "arent" -> 2, even though "aren't" apparently only has one syllable.

Hmm, I dunno. I’m not a native English speaker but I’d say you argue about it. Anyway, you could add those here:

https://github.com/wooorm/syllable/blob/d7ef1ec90892f74f25eb5f819359a4f58550b3d5/index.js#L11-L15

But I'm also wondering if there'd be a way to accomplish this without explicitly listing out every possible contraction. Maybe we can just look for an apostrophe inside of a word? Or even just remove all apostrophes from the text being analyzed? I'm not sure if that would have any unintended side-effects, but intuitively it seems like it wouldn't, and that might be a simpler approach. Either of those solutions would have the advantage of handling possessive contractions, which are also currently getting an extra syllable (e.g. "brother's" -> 3, "brothers" -> 2) and which could be used in too many words to explicitly list.

Yeah, maybe. I can’t think of any downsides. If you can’t come up with anything either, let’s do that!

wooorm commented 7 years ago

@aviv Ping! 🔔

ibanner56 commented 6 years ago

I would say "aren't" is only one syllable.

wooorm commented 6 years ago

@ibanner56 why? Wouldn't it be pronounced are-ent?

ibanner56 commented 6 years ago

I think the common pronunciation is just "arnt". It might be a localization thing, but the first results I found when I searched for an answer said it was only 1.

ibanner56 commented 6 years ago

Actually, it feels more like a context thing when I think about my own usage. It depends on the cadence of the sentence more than anything else.

Thinking about it, I'm actually more partial to 2, rather than 1.