yihui / servr

A simple HTTP server in R
https://cran.rstudio.com/package=servr
278 stars 35 forks source link

encoding problem with Chinese #31

Closed jiwonl closed 7 years ago

jiwonl commented 7 years ago

Hi, I'm trying to cluster text files written in Chinese. tcm, and vocab were made with UTF-8. However, plotting LDA model didn't work. Does servr package provide work in UTF-8? If it does, the problem would be in my codes....., I guess... It would be very thankful if you see my codes and capture files.:

str(mylist) image

it = itoken(mylist, ids = files.list, progressbar = FALSE)
v = create_vocabulary(it) %>% 
  prune_vocabulary(term_count_min = 10, doc_proportion_max = 0.2)
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer, type = "lda_c")

lda_model = 
  LDA$new(n_topics = 10, vocabulary = v, 
          doc_topic_prior = 0.1, topic_word_prior = 0.01)
doc_topic_distr = 
  lda_model$fit_transform(dtm, n_iter = 60, convergence_tol = 0.01, 
                          check_convergence_every_n = 10)

image

library(LDAvis)
library(servr)
lda_model$plot()

image

yihui commented 7 years ago

Not sure if you have contacted the author of LDAvis @cpsievert, but before you bother him, please try to update R, all your R packages (update.packages(ask = FALSE)), and if the problem still persists after you have updated them, try:

devtools::install_github('rstudio/htmltools')

If this still does not fix the issue, please provide

devtools::session_info('servr')
devtools::session_info('htmltools')

Note every time before you install anything and retry, you should restart R.

jiwonl commented 7 years ago

I figured out that json file was made as ANSI. I encoded it as utf-8 and it worked. Thanks:)

yihui commented 7 years ago

Perfect. Thanks for posting back!