words / syllable

Count syllables in an English word
https://words.github.io/syllable/
MIT License
223 stars 23 forks source link

Fixes #19, #18 #20

Closed ibanner56 closed 6 years ago

ibanner56 commented 6 years ago

Add exceptions for -ed suffixes, remove bug adding syllables for -shed and -sed.

Extract apostrophes to support contractions.

wooorm commented 6 years ago

Hey hey 👋

This looks super!!! Few things though:

  1. Could you add tests for contractions?
  2. This breaks some of the tests (see Travis) that we took from Text-Statistics. Could you check those, and argument why these new values are better than the previous?

Hope that’s not too much to ask!

Cheers, Titus

ibanner56 commented 6 years ago

So "shed" and "sed" behave differently depending on whether they're mid-word or at the end of the word. Working on fixing the check now - I seem to have broken it in both directions at the moment...

wooorm commented 6 years ago

Thanks a lot for working on this btw!

ibanner56 commented 6 years ago

Oh, I see - the problem is that we yank off the -ly, so "avowedly" and "advisedly" both just check how many syllables are in "avowed" and "advised" and then add 1.

Separately, newlywed doesn't follow the regular rules - I'll just add it to problematic.json.

ibanner56 commented 6 years ago

Fixes #22 now as well.

Does not currently include a fix for #21. At least for now I need to take some time to think about the best way to tackle that one.

ibanner56 commented 6 years ago

I would clarify that #19 was hiding #21 when the tests were being run. I've modified the tests so it's clear why they were failing in the first place, and to make them more specific to the case they're testing.

wooorm commented 6 years ago

Hey hey! Thanks for all the work on this!

  1. I still see 4 errors in Travis, are you still working on those?
  2. I’m also wondering what issues in total are being fixed by this PR now? Could you summarise the changes when ready?
  3. I see some cases of apostrophes and upper-case in problematic.json. As the check of those only receives cleaned values, they are never matched, right? If so, could you either normalise those values or drop them?

Cheers, Titus

ibanner56 commented 6 years ago

As I mentioned above, the four failed tests are a result of #21. As of right now I don't plan on fixing that issue, since I don't have a simple solution in mind. It may have to do with some other special cases that you check for.

This fixes #18, #19, and #22.

Sure, I'll normalize problematic.json later today.

ibanner56 commented 6 years ago

Actually, I'm not sure what you mean about problematic.json - there are no apostrophes in the file, nor are there any capital letters. There are both of those in my additions to the test cases, but that's ideal.

wooorm commented 6 years ago

oh whoops, sorry about that, I misread the diffs and they’re actually the tests, not the problematics, sorry!

ibanner56 commented 6 years ago

The problem with re- and diphthongs in general is that english is inconsistent. "Coin" works, but "coincident" doesn't, since those vowels are meant to be separate. In "ream" the e and the a are a digraph, but in "reapply" they're separate.

wooorm commented 6 years ago

@ibanner56 Sorry for the late reply!

Sooo, what should we do to get this moving?

ibanner56 commented 6 years ago

Realistically, it's up to you. I don't know that this is easily solveable without just keeping a larger list of hardcoded "problematics". In my project, I've just switched to using a master database of words-to-syllables as a catch-most, and then I pass any words it couldn't find into syllable.

For now my suggestion would be to remove any words that have re- prefix from the tests for now, since I don't have the bandwidth to fix #21 and it's blocking the bugfixes in my pr from actually getting checked in.

wooorm commented 6 years ago

Finally took the time to work on it and I figured it out, I reverted the words you removed, refactored a bit because it was confusing, and everything is back to what it used to be + your fixes!