High-Performance Stemmer, Tokenizer, and Spell Checker for R
Handling a vector of dictionaries in get_dict #37

Closed meztez closed 5 years ago

meztez commented 5 years ago

In get_dict :

jeroen commented 5 years ago

Can you include example code of how you use this?

meztez commented 5 years ago

I use the code below


extractdict <- function(url) {
  temp <- tempfile()
  curl_download(url, temp)
  dicts <- grep("\\.aff?|\\.dic?", unzip(temp, list = TRUE)$Name, value = TRUE)
  unzip(temp, files = dicts, overwrite = TRUE, junkpaths = TRUE, exdir = "dict")
  return(paste0("./dict/", basename(grep("\\.dic", dicts, value = TRUE))))

dict_fr <- extractdict("")
dict_en <- extractdict("")

sc <- hunspell("some text", dict = c(dict_fr, dict_en))

Other test I've made and why I used sapply

get_dict(c(dict_fr, dict_en, "en_US", "poison.dic"))
meztez commented 5 years ago

Don't merge yet, it works but the input from print(dict) looks like garbage. Sorry

meztez commented 5 years ago

I'm closing it, it is causing a bunch of weird behaviors mostly for the affix part. This is not ready.