Closed ireyoner closed 8 years ago
Right that makes sense. Can you send a PR?
By PR do you mean Pull request? If so, then it is not possible for me for now, I will do it next week, but although I know what to fix, I have no idea how it should be implemeted, must look for solution in Iternet.
I was also looking through other issues and came into conclusion that issue #7 "hunspell_suggest is slow" has the same problem as this reported by me.
I think I fixed it. New version is working for me when I call document(), but I get the same error as in issue #12. Sended a Pull request with my changes.
Thanks a lot for your thoughts and suggestions. Your PR had some nice ideas, but I prefer a lighter and more simple solution.
I added a solution which makes dictionaries a separate object and automatically caches them via memoise. Let me know if this solution works for you!
I think it is still not working - after installing from cran via install.packages("hunspell")
and testing using microbenchmark(hunspell_info())
I got:
Unit: seconds
expr min lq mean median uq max neval
hunspell_info() 13.40211 13.84603 13.99256 13.98134 14.16155 14.57229 100
but I see that those changes are not on cran yet.
Then after installing from gihub via install_github("ropensci/hunspell")
and trying to call hunspell_info()
i got first error:
Error in .Call("hunspell_R_hunspell_dict", PACKAGE = "hunspell", affix, :
"hunspell_R_hunspell_dict" not available for .Call() for package "hunspell"
and then another error:
Error: cannot allocate vector of size 3.4 Gb
In addition: Warning messages:
1: In fetch(key) :
Reached total allocation of 4086Mb: see help(memory.size)
and then the rsession crashed. I didn't do anything more than described above in this session. All function calls were for your default dictionaries.
Something must have gone wrong with your installation, it passes on all platforms. Can you try to reinstall and make sure hunspell is not loaded why you reinstall?
devtools::install_github('ropensci/hunspell")
Ok, now it is working fine. I don't know what was the problem earlier. Thanks for good work. Do you close issue or I should?
If this all works now for you and you have no futher concerns, you can close the issue.
OK thanks again for your suggestions!
You are welcome, the only issue that old users might find problematic, is that for other languages the parameter dict has changed from text to object. Maybe it would be better to call this "dictionary("en_US")" function inside each "basic" function?
It should be backward compatible. If you pass a string, it will get passed on to dictionary
so you get the old behaviour.
FYI this is on CRAN now.
When calling functions even with one word it is very slow - in C++ dictonary class is created with every function call, that is very time consuming.
If you create dictionary on first call only once for each language and then pass this class to functions it will be faster.
I have multiple ~1000 short texts and calling hunspell_stem on each takes ~1.4 sec. After modifying your function code to take list of vectors as argument:
Execution time takes ~10 sec for all texts instead of ~24 min.
But then I need to call hunspell_suggest or hunspell_analyze depending on text and it again takes ~1.4 sec for each call.