ropensci / hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R
https://docs.ropensci.org/hunspell
Other
109 stars 44 forks source link

FR: add 'format = Rd' parsing capabilities #28

Closed MichaelChirico closed 6 years ago

MichaelChirico commented 7 years ago

The package hunspell could be a great tool for package authors to use to spell-check their R documentation. Unfortunately, format = 'man' doesn't seem to do the trick:

library(hunspell)
library(magrittr)
URL = 'https://raw.githubusercontent.com/ropensci/hunspell/master/man/hunspell.Rd'
readLines(URL) %>% hunspell_find(format = 'man') %>% 
    unlist %>% unique %>% head(10)
#  [1] "roxygen"       "hunspell"      "dicpath"       "Hunspell"      "dict"          "lang"          "aff"           "dicationaries"
#  [9] "wordcloud"     "RdTextFilter" 

It only seems to be picking up on URLs & coding terms (and format = 'text' does roughly the same)


Hmm. I think I misunderstood what hunspell_find is doing (thought it was a tokenizer to use before applying the spell checker, but it's the spellchecker itself). Reviewing ?hunspell again, it's still unclear to me that format = 'man' is the same as format = 'Rd' would be? Or is format = 'man' intended to work as a parser for command-line man pages? Documentation could go for some clarification.

jeroen commented 7 years ago

Have a look at the spelling package: https://ropensci.org/blog/technotes/2017/09/07/spelling-release

MichaelChirico commented 7 years ago

Oh, awesome. Trying it out now.