ropensci / hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R
https://docs.ropensci.org/hunspell
Other
109 stars 44 forks source link

hunspell in RStudio #21

Closed tencnivel closed 7 years ago

tencnivel commented 7 years ago

Hello,

I can't make hunspell work in a script executed in RStudio

library("hunspell")
words <- c("qqqqaaaa", "dddddd", "wine")
hunspell::hunspell_check(words,dict = dictionary("en_US"))

will return TRUE TRUE TRUE instead of FALSE FALSE TRUE

Is there something to change in the RStudio configuration?

Thanks!

Vincent

jeroen commented 7 years ago

What do you get for:

dictionary("en_US")

and

sessionInfo()
tencnivel commented 7 years ago
library("hunspell")
dictionary("en_US")

makes RStudio crash

in rdesktop.log I have the following, but it's not obvious this is what makes RStudio crash

14 Mar 2017 13:52:35 [rdesktop] ERROR system error 111 (Connection refused); OCCURRED AT: void rstudio::core::http::LocalStreamAsyncClient::handleConnect(const boost::system::error_code&) /home/fedora/rstudio/src/cpp/core/include/core/http/LocalStreamAsyncClient.hpp:119; LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) /home/fedora/rstudio/src/cpp/desktop/DesktopNetworkReply.cpp:288

But the path /home/fedora/ does not exist on my machine

jeroen commented 7 years ago

Which OS are you on?

tencnivel commented 7 years ago

Linux Fedora 25

jeroen commented 7 years ago

Does the same thing happen in R when running in the console, or only in rstudio?

tencnivel commented 7 years ago

only in RStudio

jeroen commented 7 years ago

Strange. Might be a bug in rstudio. What do you see for:

hunspell:::dicpath()
tencnivel commented 7 years ago

From RStudio:

> hunspell:::dicpath()
 [1] "/home/vlaugier"                          "/usr/lib64/R/library/hunspell/dict"     
 [3] "/home/vlaugier/Library/Spelling"         "/usr/local/share/hunspell"              
 [5] "/usr/local/share/myspell"                "/usr/local/share/myspell/dicts"         
 [7] "/usr/share/hunspell"                     "/usr/share/myspell"                     
 [9] "/usr/share/myspell/dicts"                "/Library/Spelling"                      
[11] "/usr/lib/rstudio/resources/dictionaries"

From the R console:

hunspell:::dicpath()                                                                                                                                                  
 [1] "/home/vlaugier"                     "/usr/lib64/R/library/hunspell/dict"                                                                                          
 [3] "/home/vlaugier/Library/Spelling"    "/usr/local/share/hunspell"                                                                                                   
 [5] "/usr/local/share/myspell"           "/usr/local/share/myspell/dicts"                                                                                              
 [7] "/usr/share/hunspell"                "/usr/share/myspell"                                                                                                          
 [9] "/usr/share/myspell/dicts"           "/Library/Spelling"                                                                                                           
[11] "/dictionaries" 

The last element is different

jeroen commented 7 years ago

I'll try to reproduce this. What version of R and RStudio do you use?

tencnivel commented 7 years ago

Rstudio: Version 1.0.136 R version 3.3.2

thanks! Do you want me to post on stackoverflow, in case someone has an idea?

jlehtoma commented 7 years ago

I confirm the issue on OpenSUSE Tumbleweed, R 3.4.0 and RStudio 1.1.226. Works fine from R console.

> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, linux-gnu           
 ui       RStudio (1.1.226)           
 language en_US                       
 collate  fi_FI.UTF-8                 
 tz       Europe/Amsterdam            
 date     2017-05-15                  

Packages ---------------------------------------------------------------------------------------------------------------------------
 package  * version date       source        
 crayon     1.3.2   2016-06-28 CRAN (R 3.4.0)
 devtools * 1.12.0  2016-12-05 CRAN (R 3.4.0)
 digest     0.6.12  2017-01-27 CRAN (R 3.4.0)
 hunspell * 2.4     2017-04-30 CRAN (R 3.4.0)
 magrittr   1.5     2014-11-22 CRAN (R 3.4.0)
 memoise    1.1.0   2017-04-21 CRAN (R 3.4.0)
 R6         2.2.0   2016-10-05 CRAN (R 3.4.0)
 Rcpp       0.12.10 2017-03-19 CRAN (R 3.4.0)
 testthat * 1.0.2   2016-04-23 CRAN (R 3.4.0)
 withr      1.0.2   2016-06-20 CRAN (R 3.4.0)
 yaml       2.1.14  2016-11-12 CRAN (R 3.4.0)
> hunspell::dicpath()
 [1] "/home/jlehtoma"                                                  
 [2] "/home/jlehtoma/R/x86_64-suse-linux-gnu-library/3.4/hunspell/dict"
 [3] "/home/jlehtoma/Library/Spelling"                                 
 [4] "/usr/local/share/hunspell"                                       
 [5] "/usr/local/share/myspell"                                        
 [6] "/usr/local/share/myspell/dicts"                                  
 [7] "/usr/share/hunspell"                                             
 [8] "/usr/share/myspell"                                              
 [9] "/usr/share/myspell/dicts"                                        
[10] "/Library/Spelling"                                               
[11] "/usr/lib/rstudio/resources/dictionaries"  
kevinushey commented 7 years ago

I'm not able to reproduce the crashes discussed here (at least, on my Ubuntu 16.04 VM). Is there any chance we can get a gdb stack trace here? Try the following:

  1. Launch RStudio,
  2. In a terminal, attach gdb with: sudo gdp -p `pidof rsession` -ex continue
  3. Run the R code that triggers the crash,
  4. Return to the terminal and see if a backtrace was emitted.

There's a small chance here that the crash is occurring because RStudio and the hunspell package are using incompatible versions of the hunspell library.

jeroen commented 7 years ago

@kevinushey that's exactly what I thought. Does rstudio bundle it's own version of libhunspell?

jeroen commented 7 years ago

So weird I can't reproduce this on my own ubuntu 16.04. @kevinushey could you try if it changes anything if you remove this line from the source code ?

screen shot 2017-06-27 at 12 06 34 am
kevinushey commented 7 years ago

We statically link Hunspell into the rsession binary, e.g.

kevin@KBOX:/usr/lib/rstudio/bin
$ nm rsession | grep _ZN8Hunspell
0000000000cd1940 T _ZN8Hunspell10cat_resultEPcS0_
0000000000ccd750 T _ZN8Hunspell10cleanword2EPcPKcP6w_charPiS5_S5_
0000000000cd1930 T _ZN8Hunspell10get_csconvEv
< ... >
0000000000ccd3c0 T _ZN8HunspellC2EPKcS1_S1_
0000000000ccd5a0 T _ZN8HunspellD1Ev
0000000000ccd5a0 T _ZN8HunspellD2Ev

My guess is that the hunspell package is inadvertently using the same symbols as within the rsession binary, and due to version incompatibilities a crash is occurring. If that's indeed the case, the fix would be for either RStudio or hunspell to hide these symbols in a separate, private namespace. (We actually had to do something very similar to fix crashes due to multiple versions of Boost in play with some R packages)

jeroen commented 7 years ago

Hmm but there are countless packages statically linking the same libs (iconv, openssl, etc). Usually this is not an issue as long as they are in another dll. Also strange that it works fine for me. Does this occur for any version of rstudio?

kevinushey commented 7 years ago

I can reproduce the crash on my CentOS 7 VM. Here's the stack trace I see:

(gdb) bt
#0  0x00007f9fe6678fdc in std::string::assign(std::string const&) ()
   from /lib64/libstdc++.so.6
#1  0x00007f9fbeaeb7fe in operator= (__str=..., this=0x2008aa8)
    at /usr/include/c++/4.8.2/bits/basic_string.h:547
#2  hunspell_dict (dicts=..., 
    affix=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x2008a90) at utils.h:40
#3  R_hunspell_dict (affix=..., dict=..., add_words=...) at check.cpp:7
#4  0x00007f9fbeae4462 in hunspell_R_hunspell_dict (
    affixSEXP=<optimized out>, dictSEXP=<optimized out>, 
    add_wordsSEXP=<optimized out>) at RcppExports.cpp:18
#5  0x00007f9fe75e5e9d in do_dotcall () from /usr/lib64/R/lib/libR.so
#6  0x00007f9fe7629751 in Rf_eval () from /usr/lib64/R/lib/libR.so
#7  0x00007f9fe762bea0 in do_begin () from /usr/lib64/R/lib/libR.so
#8  0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#9  0x00007f9fe762b1af in R_execClosure () from /usr/lib64/R/lib/libR.so
#10 0x00007f9fe76292f4 in Rf_eval () from /usr/lib64/R/lib/libR.so
#11 0x00007f9fe762cf8e in do_set () from /usr/lib64/R/lib/libR.so
#12 0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#13 0x00007f9fe762bea0 in do_begin () from /usr/lib64/R/lib/libR.so
#14 0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#15 0x00007f9fe762b1af in R_execClosure () from /usr/lib64/R/lib/libR.so
#16 0x00007f9fe76292f4 in Rf_eval () from /usr/lib64/R/lib/libR.so
#17 0x00007f9fe762cf8e in do_set () from /usr/lib64/R/lib/libR.so
#18 0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#19 0x00007f9fe762bea0 in do_begin () from /usr/lib64/R/lib/libR.so
#20 0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#21 0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#22 0x00007f9fe762bea0 in do_begin () from /usr/lib64/R/lib/libR.so
#23 0x00007f9fe7629529 in Rf_eval () from /usr/lib64/R/lib/libR.so
#24 0x00007f9fe762b1af in R_execClosure () from /usr/lib64/R/lib/libR.so
#25 0x00007f9fe76292f4 in Rf_eval () from /usr/lib64/R/lib/libR.so
#26 0x00007f9fe7652e12 in Rf_ReplIteration () from /usr/lib64/R/lib/libR.so
#27 0x00007f9fe76531f1 in R_ReplConsole () from /usr/lib64/R/lib/libR.so
#28 0x00007f9fe76532af in run_Rmainloop () from /usr/lib64/R/lib/libR.so
#29 0x0000000000c6c24a in rstudio::r::session::runEmbeddedR(rstudio::core::FilePath const&, rstudio::core::FilePath const&, bool, bool, SA_TYPE, rstudio::r::session::Callbacks const&, rstudio::r::session::InternalCallbacks*) ()
#30 0x0000000000c50409 in rstudio::r::session::run(rstudio::r::session::ROptions const&, rstudio::r::session::RCallbacks const&) ()
#31 0x000000000068e615 in main ()

From frames 2 and 3:

#2  hunspell_dict (dicts=..., 
    affix=<error reading variable: access outside bounds of object referenced via synthetic pointer>, this=0x32887d0) at utils.h:40
40      enc_ = pMS_->get_dict_encoding();
#3  R_hunspell_dict (affix=..., dict=..., add_words=...) at check.cpp:13
13    hunspell_dict *mydict = new hunspell_dict(affix, dict);

This should hopefully point us in the right direction...

jeroen commented 7 years ago

Hmm get_dict_encoding() is indeed a relatively recent addition to the hunspell api so there might indeed be a conflict with an older version of libhunspell.

So to be clear, this problem appears on rstudio-server, right? Which version exactly? Because I have still not been able to make mine crash.

Have to go to sleep now, will look into this tomorrow.

kevinushey commented 7 years ago

I was running with CentOS 7 64bit + RStudio v1.0.143 (the official RPM from https://www.rstudio.com/products/rstudio/download/).

jeroen commented 7 years ago

OK I've been able to reproduce this using the rocker:rstudio docker image.

jeroen commented 7 years ago

I think I have found a workaround by compiling libhunspell with __attribute__((__visibility__("hidden")))...

jeroen commented 7 years ago

I am hopeful this is fixed @ https://github.com/ropensci/hunspell/commit/0acde4f6cd115a68da6f75ca8d9df505b661c008.

kevinushey commented 7 years ago

Can confirm that this fixes the issue in my test environment as well. Thanks @jeroen!

jeroen commented 7 years ago

The fixed version is on CRAN now. Thanks everyone for your patience.