Open hetong007 opened 10 years ago
Can you print out your sessionInfo()
so that I can see what versions of packages you are using?
Here comes:
> sessionInfo() R version 3.0.2 (2013-09-25) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Chinese_People's Republic of China.936 [2] LC_CTYPE=Chinese_People's Republic of China.936 [3] LC_MONETARY=Chinese_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2
And the result was generated with slidify 0.3.3
It seems to work with the latest version of slidify. I checked online using the slidify playground at http://slidify.github.io/playground. Make sure to remove the line with mode
before you paste it to the playground.
You can install the latest version of slidify and slidifyLibraries by running
devtools::install_github(c('slidify', 'slidifyLibraries'), 'ramnathv')
Before you slidify your deck, make sure to delete the libraries
folder in your slide deck directory.
I met the same problem after installing the latet version according to your code.
Since Linux/OS x could handle Chinese fluently, I guess the success of slidify playground is not surprising.
But is slidify playground running under Windows environment? I suspect the way it deals with UTF8 and GBK is the main problem.
You are right. I believe the issue is a combination of Windows + Encoding. Let me see if I can test under Windows and get back on this.
Most Chinese users are suffering from it because Windows is still the most popular OS in China. A lot of users would benefit from fixing this issue :)
Can you try this @hetong007 ? It runs the index.Rmd
through knitr
directly, before passing it on to Slidify. This solutions has fixed some problems with encoding, and I wanted to check if it has any effect on this problem.
slidify(knit("index.Rmd", encoding = 'GBK'), knit_deck = FALSE)
I used that code on the GBK
file. The result remains exactly the same.
I also tried slidify(knit("index.Rmd", encoding = 'UTF8'), knit_deck = FALSE)
on the UTF8
version. Not working either.
Okay. Let me try to isolate the problem here. If you run knit2html
on your Rmd file, are the characters displaying correctly. Let us first try to make it work with knitr
and then focus on how to get slidify
working with it.
knit2html
is not working correctly under Windows. I got error messages.
This is what I got from running it on the GBK
version:
> knit2html('index.Rmd')
processing file: index.Rmd
|.................................................................| 100%
ordinary text without R code
output file: index.md
Error in sub("#!r_highlight#", highlight, html, fixed = TRUE) :
invalid multibyte string at '<9f><<2f>title>
#!r_highlight#
#!mathjax#
<style type="text/css">
body, td {
font-family: sans-serif;
background-color: white;
font-size: 12px;
margin: 8px;
}
tt, code, pre {
font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace;
}
h1 {
font-size:2.2em;
}
h2 {
font-size:1.8em;
}
h3 {
font-size:1.4em;
}
h4 {
font-size:1.0em;
}
h5 {
font-size:0.9em;
}
h6 {
font-size:0.8em;
}
a:visited {
color: rgb(50%, 0%, 50%);
}
pre {
margin-top: 0;
max-width: 95%;
border: 1px solid #ccc;
white-space: pre-wrap;
}
pre code {
display: block; padding: 0.5em;
}
code.r, code.cpp {
background-color: #F8F8F8;
}
table, td, th {
border: none;
}
blockquote {
color:#666666;
margin:0;
padding-left: 1em;
border-left: 0.5em #EEE solid;
}
hr {
height: 0px;
border-bottom: none;
border-top-width: thin;
border-top-style: dotted;
This is what I got from running it on the UTF8
version:
> knit2html('index.Rmd')
processing file: index.Rmd
|.................................................................| 100%
ordinary text without R code
output file: index.md
Error in substring(u, so, so + ml - 1L) :
invalid multibyte string at '<9f><<2f>h2>
<hr/>
<blockquote>
<ul>
<li>璞嗙摚鐢靛奖涓殑鏍囩
<ul>
<li><img src="pics/what_is_folksonomy2.png" alt=""/></li>
</ul></li>
<li>璞嗙摚闊充箰涓殑鏍囩
<ul>
<li><img src="pics/what_is_folksonomy3.png" alt=""/></li>
</ul></li>
<li>璞嗙摚闃呰涓殑鏍囩
<ul>
<li><img src="pics/what_is_folksonomy4.png" alt=""/></li>
</ul></li>
</ul>
</blockquote>
<hr/>
<h2>浠€涔堟槸鏍囩</h2>
<blockquote>
<ul>
<li>鐢ㄦ埛涓诲姩鐢熸垚</li>
<li>瀵规枃瀛楀唴瀹逛笉鍔犻檺鍒<b6></li>
<li>鏄鐗╁搧鏈夌泭鐨勮ˉ鍏呰鏄庝俊鎭<af></li>
<li>鑻辨枃閲岀О杩欐牱鐨勪笢瑗垮彨鍋<9a><strong>folksonomy</strong>(folk+taxonomy)锛屽苟涓嶆槸<em>tag</em></li>
</ul>
</blockquote>
<hr/>
<h2>鏍囩鏃犲涓嶅湪</h2>
<p>闄や簡璞嗙摚锛屽叾瀹炶繕鏈夊緢澶氬湴鏂瑰嚭鐜颁簡鏍囩锛<9a></p>
<blockquote>
<ul>
<li>鏂版氮寰崥涓殑鏍囩
<ul>
<li><img src="pics/folksonomy_is_everywhere1.png" alt=""/></li>
</ul></li>
<li>缁熻涔嬮兘涓殑鏍囩
<ul>
<li><img src="pic
You need to explicitly pass the encoding to knit2html
using knit2html('index.Rmd', encoding = "GBK")
.
Sorry, but the result still remains the same :(
Okay. Can you save your Rmd file and provide me a link to it? Don't copy paste it as I want to ensure that it is saved with the correct encoding. Since you are having trouble using knit2html
as well, @yihui may have some idea as to what might be messing things up. Also print your sessionInfo()
so that we know the versions of all packages that were loaded in your R Console.
@yihui is not a Windows user, maybe he chose to ignore those errors before :(
Here is a repo I just created with the Rmd files index-GBK.Rmd
and index-UTF8.Rmd
. Also, sessionInfo.txt
has the result from sessionInfo()
.
Well knitr
has lots of Windows users and I have seen @yihui do a lot of encoding related work. If there is an R expert on encoding, my money will be on @yihui :)
Chinese programmers suffer from encoding related problems everyday. Thank you and good luck! :)
I think I know what is the problem, but it will take me a while to find out where the character encoding got messed up. The encoding of this page https://github.com/hetong007/temp_files/blob/master/index-GBK.html is not UTF-8, but it contains the spec <meta charset="utf-8">
, which is wrong. Actually this page contains characters with different encodings: some are UTF-8 and some are GBK. It might be the problem of slidify, slidifyLibraries, whisker, or markdown.
@hetong007 I rarely use Windows myself, but that does not mean I do not care about Windows users :)
@yihui, I understand why slidify fails on this file. The <meta charset="utf-8">
is from from the slidifyLIbraries
template for the io2012
library, and can be fixed by modifying this line in the libraries
folder.
The failure of knit2html
is possibly explained either by the mixed encoding, or the utf-8
encoding specified in the default template
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
I am thinking @hetong007 needs to convert the entire document to GBK
or UTF-8
and the modify the template, if he were using GBK
. Does that sound about right @yihui ? Thanks for taking a look at this.
I'll take a look at @kohske's PR rstudio/markdown#49 and rstudio/markdown#50. The problem should be at least alleviated after the encoding problem is gone in the markdown package, although there are still other places that may have to be fixed.
Thanks @yihui. I will look forward to these fixes. I presume that these issues are non-existent with rmarkdown
or is encoding handling still going to be tricky?
FYI, here is the fix of encoding for markdown, slidify, and knitrBootstrap. I hope someone else also will test this, and confirm it does not break any existing codes.
The below is the test script and markdown files: http://kohske.github.io/sandbox/knit-encode.zip
kohske
I tested the UTF8 file including GBK characters (below) and slidify works perfectly on Windows!! https://github.com/hetong007/Douban_Folksonomy/blob/master/index.Rmd
Note that before running slidiy
, change the locale's code page to 936.
Thanks @kohske. This is a really significant contribution as it opens up things for a large group of users. I will run through the tests and merge this weekend. Can you add yourself as a contributor in the DESCRIPTION file?
@kohske Thanks, this solution works perfectly on my Windows XP!
Meanwhile, the framework of the generated slides is not the same as before, i.e. io2012 is not applied to the generated file. Is it caused by the dev
version of slidify
@ramnathv ?
Are you using RStudio? If yes, what version? If you can paste a screenshot of the output you get, that would be useful for me to figure out what might be going on.
@ramnathv Okay, thanks. Note that MBCS-compatible slidify requires MBCS-compatible markdown package.
@kohske After the code install_github("kohske/knitrBootstrap@fix/encode", quick=TRUE)
, there's a warning
saying package ‘’ is not available (for R version 3.0.2)
. The name of the 'missing' package is empty. Is it a tiny bug or I just missed something? Thank you.
@hetong007 This is due to DESCRIPTION of knitrBootstrap. R (> 3.0.0),
should be R (> 3.0.0)
Please just ignore the warning. Thanks for your test and report!!
@ramnathv I am using the newest RStudio, i.e. 0.98.692. Under dev_mode()
, and I am generating the html file with only the pics
folder and the index.Rmd
file from the original repository.
The output information is
d> slidify("Douban_Folksonomy-master/index.Rmd", encoding="UTF8") processing file: index.Rmd |.................................................................| 100% ordinary text without R code output file: index.md Copying files to libraries/frameworks/io2012... Copying files to libraries/highlighters/highlight.js... Copying files to libraries/widgets/bootstrap... Warning messages: 1: In readLines(con, ...) : incomplete final line found on 'index.Rmd' 2: In readLines(con, ...) : incomplete final line found on 'index.Rmd'
And the first page looks like
The second page looks like
Comparing to this original version, it is not hard to find the significant difference.
@hetong007 Obviously the libraries in the original repository is quite old. The results are same to the newer version by generating under Mac OS X.
@kohske is right. I updated the default stylesheets for io2012, adding the bottle green background in the title slide and the blue color for slide titles. You can always modify it, if you prefer a different appearance of the slides.
@ramnathv @kohske Thanks for pointing that out. Then I would say Chinese users (maybe including Japanese and other users as well) will enjoy slidify in Windows! Thanks :)
Thanks to @kohske for so diligently plugging away on this. Encoding issues are not the most pleasant ones to be working on, but are so critical. I will try to merge this pull request this weekend, after ensuring that it doesn't break any other features of slidify. @kohske, please add yourself as a contributor in the DESCRIPTION!
@ramnathv I did it, thanks.
Thanks to @kohske, I just merged in some changes that provide for better encoding support. You can install it from the fix-encode
branch.
library(devtools)
install_github("ramnathv/slidify@fix-encode")
Can you install it and test if it solves the encoding issues you had mentioned here?
This fix everything on my system. But I am using Win 7 instead of Win XP now. I hope it doesn't matter.
I created two Rmd
files in GB2312 and UTF8 respectively, and ran the following code:
library(devtools)
install_github("ramnathv/slidify@fix-encode")
# setwd(...)
require(slidify)
slidify('index.Rmd', encoding='CP936')
slidify('index-UTF8.Rmd', encoding='UTF8')
The result is great.
Thank you @ramnathv and @kohske
Thanks @ramnathv, everything works perfectly with Japanese_Japan.CP932 and UTF8 under Win7.
Thanks all your efforts! @hetong007 @ramnathv @kohske This patch works well with Traditional Chinese under Win8 (with encoding UTF8) as well, great job done!
All credit should go to @kohske for painstakingly working on fixing encoding related issues.
Is the fix-encode branch ready to be merged, then?
Yes. I will be merging it this weekend, when I will be working on slidify.
Chinese characters are encoded as UTF8 in Linux/OS x, but they are encoded as GBK in Windows. Slidify is having problem with understanding UTF8 and GBK now.
One can clone my repo Douban_Folksonomy to reproduce the following result. A properly generated html version(under Ubuntu 12.04) is available here. I am using Windows XP, but the same problem could be found on Windows 7 as well.
Here are the first few lines in my 'index.Rmd' file:
When using Windows, if my 'index.Rmd' file is encoded as UTF8, then function slidify will throw out an Error , with unrecognized Chinese characters.
Obviously showing different characters and of course nobody could understand the latter one.
If I turn to GBK for Chinese characters, function slidify will work:
But the html contains unrecongnized characters:
Comparing to the proper version: