koheiw / workshop-IJTA

Rによる日本語テキスト分析入門
29 stars 2 forks source link

Japanese character display errors in network plot #3

Open yuanzhouIR opened 6 years ago

yuanzhouIR commented 6 years ago

Hi, Dr. Watanabe,

I met a problem again. When I drew the network plot of the textual data using the code textplot_network, I got the plot but all the Japanese words displayed as ロロ.

network plot

My work environment is as follow:
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ggplot2_2.2.1    quanteda_1.1.1   newspapers_0.1.1
[4] stringi_1.1.7    XML_3.98-1.10    urltools_1.7.0  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16         sna_2.4              compiler_3.4.3      
 [4] pillar_1.2.1         git2r_0.21.0         plyr_1.8.4          
 [7] tools_3.4.3          stopwords_0.9.0      digest_0.6.15       
[10] lubridate_1.7.3      memoise_1.1.0        tibble_1.4.2        
[13] gtable_0.2.0         lattice_0.20-35      rlang_0.2.0         
[16] Matrix_1.2-13        fastmatch_1.1-0      curl_3.2            
[19] ggrepel_0.7.0        withr_2.1.2          httr_1.3.1          
[22] stringr_1.3.0        devtools_1.13.5      triebeard_0.3.0     
[25] grid_3.4.3           data.table_1.10.4-3  R6_2.2.2            
[28] spacyr_0.9.6         magrittr_1.5         scales_0.5.0        
[31] colorspace_1.3-2     labeling_0.3         network_1.13.0.1    
[34] lazyeval_0.2.1       RcppParallel_4.4.0   munsell_0.4.3       
[37] statnet.common_4.0.0
koheiw commented 6 years ago

These rectangles called "tofu" and appear when there is not font for the characters. This is likely to be font issue on Mac similar to wordclound:

https://cdn.rawgit.com/quanteda/quanteda/aba2ebc9/docs/articles/pkgdown/examples/chinese.html

If you set some Japanese font in the similar way, words should print properly.

yuanzhouIR commented 6 years ago

Thanks. I added the font setting code vertex_labelfont = "MS Gothic", but it returns the following error:

> Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  polygon edge not found
In addition: Warning messages:
1: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  no font could be found for family "MS Gothic"
2: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  no font could be found for family "MS Gothic"

Do you know what wrong it is?

koheiw commented 6 years ago

As the error says: your machine does not have MS Gothic.

yuanzhouIR commented 6 years ago

I checked my Font Book, and it does have MS Gothic. I also tried MS Mincho, which I do not have, and it returns:

> Error in check_font(vertex_labelfont) : 
  MS Mincho is not found on your system. Run extrafont::import_font() to use custom fonts.

So I think there is some other things wrong.

koheiw commented 6 years ago

Did you run extrafont::import_font()?

yuanzhouIR commented 6 years ago

I did, but I think it should be extrafont::font_import(). I also run extrafont::fonts(), it returns:

> [1] ".Keyboard"             "System Font"          
 [3] "Andale Mono"           "Apple Braille"        
 [5] "AppleMyungjo"          "Arial Black"          
 [7] "Arial"                 "Arial Narrow"         
 [9] "Arial Rounded MT Bold" "Arial Unicode MS"     
[11] "Bodoni Ornaments"      "Bodoni 72 Smallcaps"  
[13] ""                      "Brush Script MT"      
[15] "Comic Sans MS"         "Courier New"          
[17] "DIN Alternate"         "DIN Condensed"        
[19] "Georgia"               "Impact"               
[21] "Khmer Sangam MN"       "Lao Sangam MN"        
[23] "Luminari"              "Microsoft Sans Serif" 
[25] "MS Gothic"             "Tahoma"               
[27] "Times New Roman"       "Trattatello"          
[29] "Trebuchet MS"          "Verdana"              
[31] "Webdings"              "Wingdings"            
[33] "Wingdings 2"           "Wingdings 3"          

The 25th is MS Gothic, but when I run textplot_network(mx_col, min_freq = 0.95, edge_size = 5, vertex_labelfont = "MS Gothic"), it still returns:

> Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  polygon edge not found
In addition: Warning message:
In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  no font could be found for family "MS Gothic"
koheiw commented 6 years ago

Then please open an issue here https://github.com/quanteda/quanteda/issues and upload replication code. I have a Japanese colleague who uses a Mac.