Open haven-jeon opened 6 years ago
Hi @haven-jeon and @junhewk ! I am impressed how quickly you can publish your package in CRAN. I run a test on Windows and Linux. It seems that it has some issue with Windows (or I am doing things worng). It works smoothly on Linux, but output of pos()
seems too large (posParallel()
looks fine).
> require(quanteda)
> require(RcppMeCab)
> #devtools::install_github("quanteda/quanteda.corpora")
> require(quanteda.corpora)
>
> corp <- download("data_corpus_foreignaffairscommittee")
> txt <- tail(texts(corp), 1000)
>
> pos(txt[1], join = FALSE)
Exception:
Error in posRcpp(sentence, sys_dic, user_dic) :
Not compatible with STRSXP: [type=NULL].
> pos(txt[1], join = TRUE, sys_dic = "C:\Program Files (x86)\MeCab\dic\ipadic")
Error: '\P' is an unrecognized escape in character string starting ""C:\P"
> pos(txt[1], join = TRUE, sys_dic = "C://Program Files (x86)//MeCab//dic//ipadic")
Exception:
Error in posJoinRcpp(sentence, sys_dic, user_dic) :
Not compatible with STRSXP: [type=NULL].
> pos(txt[1], join = TRUE, sys_dic = "C:/Program Files (x86)/MeCab/dic/ipadic")
Exception:
Error in posJoinRcpp(sentence, sys_dic, user_dic) :
Not compatible with STRSXP: [type=NULL].
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quanteda.corpora_0.85 RcppMeCab_0.0.1.1 quanteda_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 knitr_1.20 magrittr_1.5 stopwords_0.9.0 munsell_0.5.0 colorspace_1.3-2
[7] lattice_0.20-35 rlang_0.2.1 fastmatch_1.1-0 stringr_1.3.1 plyr_1.8.4 tools_3.5.0
[13] grid_3.5.0 data.table_1.11.4 gtable_0.2.0 xfun_0.2 spacyr_0.9.9 htmltools_0.3.6
[19] RcppParallel_4.4.0 yaml_2.1.19 lazyeval_0.2.1 rprojroot_1.3-2 digest_0.6.15 tibble_1.4.2
[25] bookdown_0.7 Matrix_1.2-14 ggplot2_2.2.1 evaluate_0.10.1 rmarkdown_1.10 blogdown_0.6
[31] stringi_1.1.7 pillar_1.2.3 compiler_3.5.0 scales_0.5.0 backports_1.1.2 lubridate_1.7.4
> require(quanteda)
> require(RcppMeCab)
> #devtools::install_github("quanteda/quanteda.corpora")
> require(quanteda.corpora)
>
> corp <- download("data_corpus_foreignaffairscommittee")
> txt <- tail(texts(corp), 1000)
>
> out <- posParallel(txt)
> object.size(out)
7046928 bytes
>
> out2 <- pos(txt)
> object.size(out2)
1005692696 bytes
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User Edition 5.13
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quanteda.corpora_0.85 RcppMeCab_0.0.1.1 quanteda_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 RMeCab_0.99999 knitr_1.20 magrittr_1.5 stopwords_0.9.0
[6] munsell_0.5.0 colorspace_1.3-2 lattice_0.20-35 rlang_0.2.1 fastmatch_1.1-0
[11] stringr_1.3.1 plyr_1.8.4 tools_3.4.4 grid_3.4.4 data.table_1.11.4
[16] gtable_0.2.0 xfun_0.2 spacyr_0.9.91 htmltools_0.3.6 RcppParallel_4.4.0
[21] yaml_2.1.19 lazyeval_0.2.1 rprojroot_1.3-2 digest_0.6.15 tibble_1.4.2
[26] bookdown_0.7 Matrix_1.2-14 ggplot2_2.2.1 evaluate_0.10.1 rmarkdown_1.10
[31] blogdown_0.6 stringi_1.2.3 pillar_1.2.3 compiler_3.4.4 scales_0.5.0
[36] backports_1.1.2 lubridate_1.7.4
The package looks good already, but my suggesting/request is to
sys_dic
using options()
I will present at TokyoR on 15th next month. I will highlight your package in my talk. There will be people from RStudio too.
Thank you so much, @haven-jeon and @koheiw ! I tested and revised the package.
sys_dic()
working properly.pos()
function. I fixed it.options(mecabSysDic=)
to preserve user preference of MeCab system dictionary.> library(quanteda)
> library(quanteda.corpora)
> library(RcppMeCab)
>
> corp <- download("data_corpus_foreignaffairscommittee")
> txt <- tail(texts(corp), 1000)
>
> out <- posParallel(txt)
> out2 <- pos(txt)
>
> object.size(out)
8831024 bytes
> object.size(out2)
8831024 bytes
>
> out[1000]
$`○三ッ矢委員長 以上で説明は終わりました。\n 次回は、公報をもってお知らせすることとし、本日は、これにて散会いたします。\n 午後一時十一分散会\n`
[1] "○/記号" "三ッ矢/名詞" "委員/名詞" "長/名詞" " /記号" "以上/名詞"
[7] "で/助詞" "説明/名詞" "は/助詞" "終わり/動詞" "まし/助動詞" "た/助動詞"
[13] "。/記号" " /記号" "次回/名詞" "は/助詞" "、/記号" "公報/名詞"
[19] "を/助詞" "もっ/動詞" "て/助詞" "お知らせ/名詞" "する/動詞" "こと/名詞"
[25] "と/助詞" "し/動詞" "、/記号" "本日/名詞" "は/助詞" "、/記号"
[31] "これ/名詞" "にて/助詞" "散会/名詞" "いたし/動詞" "ます/助動詞" "。/記号"
[37] " /記号" " /記号" " /記号" " /記号" "午後/名詞" "一/名詞"
[43] "時/名詞" "十/名詞" "一/名詞" "分/名詞" "散会/名詞"
> out2[1000]
$`○三ッ矢委員長 以上で説明は終わりました。\n 次回は、公報をもってお知らせすることとし、本日は、これにて散会いたします。\n 午後一時十一分散会\n`
[1] "○/記号" "三ッ矢/名詞" "委員/名詞" "長/名詞" " /記号" "以上/名詞"
[7] "で/助詞" "説明/名詞" "は/助詞" "終わり/動詞" "まし/助動詞" "た/助動詞"
[13] "。/記号" " /記号" "次回/名詞" "は/助詞" "、/記号" "公報/名詞"
[19] "を/助詞" "もっ/動詞" "て/助詞" "お知らせ/名詞" "する/動詞" "こと/名詞"
[25] "と/助詞" "し/動詞" "、/記号" "本日/名詞" "は/助詞" "、/記号"
[31] "これ/名詞" "にて/助詞" "散会/名詞" "いたし/動詞" "ます/助動詞" "。/記号"
[37] " /記号" " /記号" " /記号" " /記号" "午後/名詞" "一/名詞"
[43] "時/名詞" "十/名詞" "一/名詞" "分/名詞" "散会/名詞"
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppMeCab_0.0.1.2 quanteda.corpora_0.85 quanteda_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 magrittr_1.5 devtools_1.13.5 stopwords_0.9.0 munsell_0.5.0
[6] colorspace_1.3-2 lattice_0.20-35 R6_2.2.2 rlang_0.2.1 fastmatch_1.1-0
[11] stringr_1.3.1 httr_1.3.1 plyr_1.8.4 tools_3.5.0 grid_3.5.0
[16] data.table_1.11.4 gtable_0.2.0 spacyr_0.9.9 git2r_0.21.0 withr_2.1.2
[21] lazyeval_0.2.1 RcppParallel_4.4.0 digest_0.6.15 tibble_1.4.2 Matrix_1.2-14
[26] ggplot2_2.2.1 curl_3.2 memoise_1.1.0 stringi_1.2.2 compiler_3.5.0
[31] pillar_1.2.3 scales_0.5.0 lubridate_1.7.4
Could you try this revised version on Github, @koheiw ?
You should put Sys.setenv()
before installing Japanese DLLs for the package.
Sys.setenv(MECAB_LANG='jp')
devtools::install_github("junhewk/RcppMeCab")
Sounds promising, but there is system dependency. On my Windows, installation from github fails. How dose the package find the location of Mecab?
> Sys.setenv(MECAB_LANG='jp')
> devtools::install_github("junhewk/RcppMeCab")
-lR
installing to C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab/libs/x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
converting help for package 'RcppMeCab'
finding HTML links ... done
RcppMeCab html
pos html
posParallel html
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
Error: package or namespace load failed for 'RcppMeCab' in inDL(x, as.logical(local), as.logical(now), ...):
unable to load shared object 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab/libs/x64/RcppMeCab.dll':
LoadLibrary failure: The specified module could not be found.
Error: loading failed
Execution halted
ERROR: loading failed for 'x64'
* removing 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab'
* restoring previous 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab'
In R CMD INSTALL
By the way, Sys.setenv(MECAB_LANG='ja')
would be more appropriate because 'jp' is Japan's country code.
I changed the environment variable as you said. Now,
Sys.setenv(MECAB_LANG="ja")
will work.
I also tried installing the package from the Github in my Windows 10 via Parallels (in below, I removed the compiling messages).
> Sys.setenv(MECAB_LANG="ja")
> devtools::install_github("junhewk/RcppMeCab")
Downloading GitHub repo junhewk/RcppMeCab@master
from URL https://api.github.com/repos/junhewk/RcppMeCab/zipball/master
Installing RcppMeCab
"C:/PROGRA~1/R/R-35~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD \
INSTALL "C:/Users/jk/AppData/Local/Temp/RtmpkXLar3/devtools1becc323f16/junhewk-RcppMeCab-d7b786b" \
--library="C:/Users/jk/Documents/R/win-library/3.5" --install-tests
* installing *source* package 'RcppMeCab' ...
** libs
*** arch - i386
*** arch - x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (RcppMeCab)
In R CMD INSTALL
The R package installer downloads precompiled MeCab DLL files (it is located in the Release) before compiling each cpp files. If MECAB_LANG value is "ja", then the installer will download mecab32_ja.tar.gz
and mecab64_ja.tar.gz
to i386
and x64
subfolder of compiling directory. In Windows, the package could be installed without the library (surely the library will be needed when the user want to run functions).
It passes CRAN win-builder (and for the update, I didn't change anything of Makevars.win
file). I'm so sorry to ask, but could you specify your Windows installing environment?
No worries. I am happy to do more tests. The package compiles if I build the package from the source. I think these warnings explain the installation failure.
* removing 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab'
* restoring previous 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab'
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\Kohei\Documents\R\win-library\3.5\00LOCK-junhewk-RcppMeCab-510d53c\RcppMeCab\libs\x64\libmecab.dll to C:\Users\Kohei\Documents\R\win-library\3.5\RcppMeCab\libs\x64\libmecab.dll: Permission denied
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\Kohei\Documents\R\win-library\3.5\00LOCK-junhewk-RcppMeCab-510d53c\RcppMeCab\libs\x64\RcppMeCab.dll to C:\Users\Kohei\Documents\R\win-library\3.5\RcppMeCab\libs\x64\RcppMeCab.dll: Permission denied
I think these are the environmental variables you need.
R_ARCH /x64
R_COMPILED_BY gcc 4.9.3
R_DOC_DIR C:/PROGRA~1/R/R-35~1.0/doc
R_HOME C:/PROGRA~1/R/R-35~1.0
R_LIBS_USER C:/Users/Kohei/Documents/R/win-library/3.5
R_USER C:/Users/Kohei/Documents
RCPP_PARALLEL_NUM_THREADS 2
READYAPPS C:\ProgramData\Lenovo\ReadyApps
RMARKDOWN_MATHJAX_PATH C:/Program Files/RStudio/resources/mathjax-26
RS_LOCAL_PEER \\.\pipe\34851-rsession
RS_RPOSTBACK_PATH C:/Program Files/RStudio/bin/rpostback
RS_SHARED_SECRET 63341846741
RSTUDIO 1
RSTUDIO_CONSOLE_COLOR 256
RSTUDIO_CONSOLE_WIDTH 80
RSTUDIO_MSYS_SSH C:/Program Files/RStudio/bin/msys-ssh-1000-18
RSTUDIO_PANDOC C:/Program Files/RStudio/bin/pandoc
RSTUDIO_SESSION_PORT 34851
RSTUDIO_USER_IDENTITY Kohei
RSTUDIO_WINUTILS C:/Program Files/RStudio/bin/winutils
I searched about the matter you mentioned, and found this discussion in SO:1 and SO:2. They said that it happens with antivirus protection or account authority. Downloading the source and install_local()
would solve the problem, I believe. Could you try this on?
I'm working with return the result as a data.frame, following your proposal. Thanks for your help!
I could solve the issue only by restarting R session before installing (I have no third-parity anti virus software on this machine). Installation goes very close to the end now, but there is one more error to tackle.
Error: package or namespace load failed for 'RcppMeCab' in inDL(x, as.logical(local), as.logical(now), ...):
unable to load shared object 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab/libs/x64/RcppMeCab.dll':
LoadLibrary failure: The specified module could not be found.
Error: loading failed
Execution halted
ERROR: loading failed for 'x64'
* removing 'C:/Users/Kohei/Documents/R/win-library/3.5/RcppMeCab'
I also noticed that the installer downloads mecab32 instead of 64. I wonder if this this related to the above error. Here I am setting a random string xxx
to trigger the download error message.
Error in download.file(url = "https://github.com/junhewk/RcppMeCab/releases/download/0.0.1.0/mecab32_xxx.tar.gz", :
cannot open URL 'https://github.com/junhewk/RcppMeCab/releases/download/0.0.1.0/mecab32_xxx.tar.gz'
Thanks. It works on one of the Windows machines!
> require(quanteda)
> require(RcppMeCab)
> require(quanteda.corpora)
> corp <- download("data_corpus_foreignaffairscommittee")
> txt <- tail(texts(corp), 1000)
> pos(txt[1], join = FALSE)
$...
記号 名詞 記号 名詞 記号 名詞 記号 名詞 助詞 動詞
"○" "宮本" "(" "徹" ")" "委員" " " "内容" "について" "差し控え"
動詞 助詞 助詞 助動詞 助詞 記号 名詞 動詞 助動詞 助詞
"させ" "て" "じゃ" "なく" "て" "、" "確認" "す" "べき" "じゃ"
助動詞 助詞 助詞 名詞 助詞 動詞 助詞 動詞 名詞 助動詞
"ない" "か" "という" "こと" "を" "言っ" "て" "いる" "わけ" "です"
助詞 記号 記号 名詞 助詞 連体詞 名詞 助詞 動詞 助詞
"よ" "。" " " "沖縄" "の" "あの" "事故" "を" "受け" "て"
記号 名詞 助詞 記号 連体詞 名詞 助詞 名詞 名詞 助詞
"、" "皆さん" "が" "、" "その" "運用" "の" "安全" "性" "を"
名詞 動詞 助詞 動詞 記号 名詞 動詞 助詞 動詞 助詞
"確認" "し" "て" "いる" "、" "確認" "し" "て" "いる" "という"
名詞 助詞 動詞 助詞 動詞 名詞 助動詞 助詞 記号 副詞
"こと" "を" "言っ" "て" "いる" "わけ" "です" "けれども" "、" "実際"
動詞 助詞 動詞 助動詞 名詞 名詞 助詞 名詞 助詞 名詞
"出" "て" "き" "た" "米" "軍" "の" "マニュアル" "という" "の"
助詞 記号 名詞 名詞 名詞 助詞 名詞 名詞 助動詞 名詞
"は" "、" "空中" "給油" "訓練" "で" "破滅" "的" "な" "影響"
助詞 名詞 助詞 動詞 動詞 名詞 助詞 動詞 名詞 助動詞
"が" "結果" "として" "もたらさ" "れる" "危険" "が" "ある" "ん" "だ"
助詞 名詞 助詞 動詞 動詞 助詞 動詞 助動詞 名詞 助動詞
"という" "の" "が" "書か" "れ" "て" "い" "た" "わけ" "です"
助詞 助詞 記号 名詞 助詞 記号 副詞 記号 名詞 助詞
"よ" "ね" "。" "それ" "を" "、" "なぜ" "、" "アメリカ" "に"
名詞 名詞 助詞 名詞 助詞 動詞 助詞 副詞 動詞 助詞
"自衛隊" "員" "の" "皆さん" "が" "行っ" "て" "実際" "見" "て"
動詞 助詞 助詞 動詞 助動詞 記号 名詞 助詞 動詞 助動詞
"いる" "に" "も" "かかわら" "ず" "、" "それ" "を" "つかも" "う"
助詞 動詞 助動詞 名詞 助詞 記号 名詞 助詞 動詞 助動詞
"と" "し" "ない" "の" "か" "、" "そこ" "が" "わから" "ない"
助動詞 助詞 記号 記号 動詞 助動詞 助動詞 名詞 助動詞 助詞
"です" "よ" "。" " " "知り" "たく" "ない" "ん" "です" "か"
記号 名詞 名詞 助詞 名詞 助詞 動詞 助動詞 助動詞 助詞
"。" "危険" "性" "に" "目" "を" "向け" "たく" "ない" "という"
名詞 助動詞 助詞 記号 副詞 助動詞 助詞 記号 名詞 接頭詞
"こと" "です" "か" "。" "どう" "です" "か" "、" "若宮" "副"
名詞 記号
"大臣" "。"
Cool! So happy to hear that.
I also want your advice about format of the resulting data frame.
This is what I got for the temporary version:
> library(corpus)
> temp <- "○三ッ矢委員長 以上で説明は終わりました。\n 次回は、公報をもってお知らせすることとし、本日は、これにて散会いたします。\n 午後一時十一分散会\n"
> print.corpus_frame(as.data.frame(pos(c(txt1=temp), sys_dic="", user_dic="")))
doc_id sentence_id token_id token pos subtype
1 txt1 1 1 ○ 記号 一般
2 txt1 1 2 三ッ矢 名詞 固有名詞
3 txt1 1 3 委員 名詞 一般
4 txt1 1 4 長 名詞 接尾
5 txt1 1 5 以上 名詞 非自立
6 txt1 1 6 で 助詞 格助詞
7 txt1 1 7 説明 名詞 サ変接続
8 txt1 1 8 は 助詞 係助詞
9 txt1 1 9 終わり 動詞 自立
10 txt1 1 10 まし 助動詞
11 txt1 1 11 た 助動詞
12 txt1 1 12 。 記号 句点
13 txt1 2 1 次回 名詞 副詞可能
14 txt1 2 2 は 助詞 係助詞
15 txt1 2 3 、 記号 読点
16 txt1 2 4 公報 名詞 一般
17 txt1 2 5 を 助詞 格助詞
18 txt1 2 6 もっ 動詞 自立
19 txt1 2 7 て 助詞 接続助詞
20 txt1 2 8 お知らせ 名詞 サ変接続
...
(I used corpus
library to print UTF-8 characters in data frame correctly in Windows.)
As you may know, MeCab returns several values for the morpheme: 品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用型,活用形,原形,読み,発音 I used 品詞 and 品詞細分類1 for the temporary output. (In Korean version, this is a part-of-speech value and its subtype.) Is it okay for analyzing Japanese? The problem is, Korean and Japanese MeCab result is different, so I should compromise.
Hi! I believe I've managed to install your package, but I get an error when I try to run pos()
with Japanese text:
> pos("これはぺんです", join = FALSE)
> Exception:
> list()
> Error in print.function(args(obj)) :
> invalid multibyte string at '<ff><fe><61>ny<ff><fe>")
Could this be a problem related to character encoding?
Hi @DrMaphuse ,
I think it is an encoding problem, but I can't reproduce it in my environment.
In pos
, there's no print
function, hence, it might be a problem in the R environment when the console tries to print the result.
Can you save the result by result <- pos("これはぺんです", join = FALSE)
?
I also recommend using iconv("これはぺんです", from="SHIFT-JIS", to="UTF-8")
.
RcppMeCab
gets a character vector directly from R (via Rcpp
vector type), processes the string, and returns the result with UTF-8 encoding.
Thanks for your input! I have tried your suggestion, but unfortunately, the output is a List of 0
.
Regarding the iconv()
, is this necessary even if my MeCab and my R script files are already in UTF-8?
That's a little strange. I thought that your input environment was SHIFT-JIS or some other Japanese encodings which use multibyte characters (as discussed in this Devtools Issue). If you feed UTF-8 into the function, I can't find what is the problem.
@DrMaphuse , could you paste the result of sessionInfo()
on your R console?
Sure!
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppMeCab_0.0.1.2
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 Rcpp_0.12.17 RcppParallel_4.4.1
@DrMaphuse , I'm so sorry that I can't reproduce your problem.
> Sys.setlocale("LC_ALL", "English_United Kingdom.1252")
[1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
> library(RcppMeCab)
> pos("これはぺんです", sys_dic="C:/PROGRA~2/MeCab/dic/ipadic")
$`これはぺんです`
[1] "これ/名詞" "は/助詞" "ぺん/名詞" "です/助動詞"
> pos("これはぺんです", join=FALSE, sys_dic="C:/PROGRA~2/MeCab/dic/ipadic")
$`これはぺんです`
名詞 助詞 名詞 助動詞
"これ" "は" "ぺん" "です"
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppMeCab_0.0.1.2
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 yaml_2.1.19 Rcpp_0.12.17 RcppParallel_4.4.0
How about change R console's locale to English_United States.1252 via Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
? You can also try re-installing MeCab and selecting UTF-8 for the locale of IPA dictionary.
Thank you for these suggestions - I have tried Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
and Sys.setlocale("LC_ALL", "ja")
, but unfortunately the List
output is still empty.
I have selected UTF-8 for the MeCab installation, so that should be correct already, but I might try to reinstall.
Is it possible that I installed your package wrong? I installed with devtools and install_github()
, with latest version of R, RStudio and RTools.
It's now on cran : https://CRAN.R-project.org/package=RcppMeCab
Need to check RcppMeCab results using Japanese.
@koheiw, Could you help on this? Any ideas?