koheiw / workshop-IJTA

Rによる日本語テキスト分析入門
29 stars 2 forks source link

no object in values #1

Open MIZUNOYUSUKE opened 7 years ago

MIZUNOYUSUKE commented 7 years ago

I made tables with @koheiw's github (https://github.com/koheiw/IJTA/blob/master/documents/corpus.md) to learn how to use corpus. Although other process had complete successfully, I made an error when I tried to count how many objects in the values. The file named "data_corpus_asahi_2016" is exist.

corp_morning <- corpus_subset(data_corpus_asahi_2016, edition == '朝刊') # 朝刊だけを選択 ndoc(corp_morning)

To improve this situation, I confirmed that machine's default encoding and Rstudio's one were UTF-8 ran these commands below.

require(quanteda) # パッケージの読み込み
> txt <- readLines("data/asahi_head.txt")
> setwd('C:\\Users\\mizuno yusuke\\Downloads\\IJTA-master\\IJTA-master')
> load('data/data_corpus_asahi_2016.RData') # Rオブジェクトの読み込み
> table(docvars(corp, 'month'))

I worked under the environment below.

version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Japanese_Japan.932  LC_CTYPE=Japanese_Japan.932    LC_MONETARY=Japanese_Japan.932
[4] LC_NUMERIC=C                   LC_TIME=Japanese_Japan.932    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quanteda_0.9.9-50

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11        lattice_0.20-35     digest_0.6.12       withr_1.0.2         plyr_1.8.4         
 [6] grid_3.4.0          gtable_0.2.0        scales_0.4.1        RcppParallel_4.3.20 ggplot2_2.2.1      
[11] rlang_0.1.1         stringi_1.1.5       lazyeval_0.2.0      data.table_1.10.4   Matrix_1.2-9       
[16] fastmatch_1.1-0     devtools_1.13.1     tools_3.4.0         munsell_0.4.3       compiler_3.4.0     
[21] colorspace_1.3-2    memoise_1.1.0       tibble_1.3.1   
koheiw commented 7 years ago

@MIZUNOYUSUKE  報告ありがとうございます。エラーメッセージも投稿してもらえますか?

MIZUNOYUSUKE commented 7 years ago

赤文字のエラーメッセージは出ませんでしたが、実行すると以下のようになりました。

> corp_morning <- corpus_subset(data_corpus_asahi_2016, edition == '朝刊') # 朝刊だけを選択
> ndoc(corp_morning)
[1] 0
> table(weekdays(docvars(corp_morning, 'date')))
< table of extent 0 >
koheiw commented 7 years ago

Rコンソール上での入力がUTF-8になっていないことが考えられます。試しに以下のコマンドをコピーせず、タイプし、実行してみてください。

Encoding('朝刊')
Encoding('あいうえお')