ropensci / hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R
https://docs.ropensci.org/hunspell
Other
109 stars 44 forks source link

Handling a vector of dictionaries in get_dict #37

Closed meztez closed 5 years ago

meztez commented 5 years ago

In get_dict :

codecov-io commented 5 years ago

Codecov Report

Merging #37 into master will increase coverage by <.01%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #37      +/-   ##
==========================================
+ Coverage   39.81%   39.82%   +<.01%     
==========================================
  Files          28       28              
  Lines        8359     8360       +1     
==========================================
+ Hits         3328     3329       +1     
  Misses       5031     5031
Impacted Files Coverage Δ
R/hunspell.R 81.81% <100%> (+0.18%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 50e57e6...da358a9. Read the comment docs.

jeroen commented 5 years ago

Can you include example code of how you use this?

meztez commented 5 years ago

I use the code below

library(hunspell)
library(curl)

extractdict <- function(url) {
  temp <- tempfile()
  curl_download(url, temp)
  dicts <- grep("\\.aff?|\\.dic?", unzip(temp, list = TRUE)$Name, value = TRUE)
  unzip(temp, files = dicts, overwrite = TRUE, junkpaths = TRUE, exdir = "dict")
  unlink(temp)
  return(paste0("./dict/", basename(grep("\\.dic", dicts, value = TRUE))))
}

dict_fr <- extractdict("https://addons.mozilla.org/firefox/downloads/file/1163947/french_spelling_dictionary-6.3.1webext.xpi")
dict_en <- extractdict("https://addons.mozilla.org/firefox/downloads/file/1163920/canadian_english_dictionary-3.0.6.1webext.xpi")

sc <- hunspell("some text", dict = c(dict_fr, dict_en))

Other test I've made and why I used sapply

get_dict(c(dict_fr, dict_en, "en_US", "poison.dic"))
meztez commented 5 years ago

Don't merge yet, it works but the input from print(dict) looks like garbage. Sorry

meztez commented 5 years ago

I'm closing it, it is causing a bunch of weird behaviors mostly for the affix part. This is not ready.