ropensci / hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R
https://docs.ropensci.org/hunspell
Other
109 stars 44 forks source link

Error when assigning dictionary to variable #31

Closed oloverm closed 6 years ago

oloverm commented 7 years ago

The first time I create a variable for a dictionary, it gives me an error. Using the dictionary later in hunspell works fine though. If I do it twice, the second time doesn't produce an error?

> acceptedWords <- read_excel("Custom dictionary.xlsx", sheet="accepted")$words %>%
+   tolower()
> customDict <- dictionary("en_GB", add_words = acceptedWords)
Error in unclass(x) : cannot unclass an external pointer
> customDict <- dictionary("en_GB", add_words = acceptedWords)
>
jeroen commented 7 years ago

I cannot reproduce this. What is in your acceptedWords ?

oloverm commented 7 years ago

It's just a character vector, I still get the error like this:

library("hunspell")
acceptedWords <- c("one", "two", "three")
customDict <- dictionary("en_GB", add_words = acceptedWords)
jeroen commented 7 years ago

I don't get this error. Are you using the latest version?

jeroen commented 7 years ago

Can you show the traceback() after getting the error?

oloverm commented 7 years ago

Ok I've narrowed it down a little, it only happens when I also have dplyr loaded. So in a new R session, what I wrote above doesn't produce the error, but this does:

library("dplyr")
library("hunspell")
acceptedWords <- c("one", "two", "three")
customDict <- dictionary("en_GB", add_words = acceptedWords)

This is what I get from traceback():

3: length.dictionary(obj)
2: length(obj)
1: (function (env, objName) 
   {
       obj <- get(objName, env)
       hasNullPtr <- .rs.hasNullExternalPointer(obj)
       if (hasNullPtr) {
           val <- "<Object with null pointer>"
           desc <- "An R object containing a null external pointer"
           size <- 0
           len <- 0
       }
       else {
           val <- "(unknown)"
           desc <- ""
           size <- object.size(obj)
           len <- length(obj)
       }
       class <- .rs.getSingleClass(obj)
       contents <- list()
       contents_deferred <- FALSE
       if (is.language(obj) || is.symbol(obj)) {
           val <- deparse(obj)
       }
       else if (!hasNullPtr) {
           if (size > 524288) {
               len_desc <- if (len > 1) 
                   paste(len, " elements, ", sep = "")
               else ""
               if (is.data.frame(obj)) {
                   val <- "NO_VALUE"
                   desc <- .rs.valueDescription(obj)
               }
               else {
                   val <- paste("Large ", class, " (", len_desc, 
                     capture.output(print(size, units = "auto")), 
                     ")", sep = "")
               }
               contents_deferred <- TRUE
           }
           else {
               val <- .rs.valueAsString(obj)
               desc <- .rs.valueDescription(obj)
               if (class == "data.table" || class == "ore.frame" || 
                   class == "cast_df" || class == "xts" || class == 
                   "DataFrame" || is.list(obj) || is.data.frame(obj) || 
                   isS4(obj)) {
                   contents <- .rs.valueContents(obj)
               }
           }
       }
       list(name = .rs.scalar(objName), type = .rs.scalar(class), 
           is_data = .rs.scalar(is.data.frame(obj)), value = .rs.scalar(val), 
           description = .rs.scalar(desc), size = .rs.scalar(size), 
           length = .rs.scalar(len), contents = contents, contents_deferred = .rs.scalar(contents_deferred))
   })(<environment>, "customDict")
jeroen commented 7 years ago

It is because the rlang package also defines a dictionary class + length.dictionary method.

> library(hunspell)
> x <- dictionary()

> library(rlang)
> print(x)
## Error: object of type 'externalptr' is not subsettable

However your traceback stalls in rstudio. You can probably fix the issue by updating rstudio.

@lionel- @hadley Is there some way this can be avoided, besides renaming the class?

lionel- commented 7 years ago

This is an example of why the S3 registration approach enforced by CRAN is terrible for encapsulation. The only way to make it work hygienically is to prefix the S3 classes with the package name. The prefixes would act as manual namespaces for S3.

The sad part is that lexically scoped methods work perfectly well, they're just discouraged by R CMD check.

As for the dictionary class in rlang I think it is not generally useful. We could think about renaming it to something more specific like eval_pronoun.

jeroen commented 7 years ago

I just renamed my class to hunspell_dictionary, it is only used to provide a pretty print method.

This seems like a serious problem with ever growing CRAN though. Somehow I thought that registering the classes and methods in the NAMESPACE would avoid this, but of course R doesn't know which dictionary class to use when an object has an attribute with class = "dictionary"

lionel- commented 7 years ago

Worst part is, merely loading a package causes the methods to be registered and can thus have far reaching effects. So you're at the mercy of the slightest changes in your dependency tree.

jeroen commented 6 years ago

Fixed in 2.8