Closed coolbutuseless closed 5 years ago
I like it! But I think it is out of scope for syn
, although I do really like it. It also looks like there aren't any additional dependencies or anything, so perhaps it makes sense to include it in the same package, otherwise it would involve duplicating work just to create a separate package
I wonder if it would be worthwhile to consider a "words" set of R packages that emulate the words organisation. This would open up the scope for things like:
readable
could use many of the different measurements of readability: for example, flesch-kinkadeOK so I reckon go ahead and add it!
But can you add the following:
I'm going to move my thoughts earlier into an issue
I think this implementation is still a bit half-baked. Retracting it until I can make it better.
I had a think about homophones, and by the time I found a good data source and figured out what I could do with it, I ended up having a package!
https://github.com/coolbutuseless/phon wraps the CMU pronouncing dictionary and generates:
I think this makes a good orthogonal/companion package to syn
. i.e. syn finds new words based upon meaning, phon
finds new words based upon sound.
Love it!
This is maybe getting out-of-scope, and i won't be at all offended when this PR is rejected.
Homophones are calculated by considering the phonetic encoding of the
all_words
list.If 2 words have the same phonetic encoding then we consider them homophones.
Algorithms for phonetic encoding were DoubleMetaphone and Soundex.
It doesn't work too bad considering the diversity of english pronunciation, however we might have to ignore matches for any words with spaces in them. I believe the phonetic encoding just drops all spaces and treats it as a single word - which is definitely going to give bad matches.
Created on 2018-11-28 by the reprex package (v0.2.1)