Closed tg-x closed 14 years ago
seems like this works now, but there's a new problem with displaying utf8 characters, they appear as 2 latin1 characters
ok this is because of the change in helper.rb:
Nokogiri::HTML produces wrong encoding, whereas with Nokogiri::XML the HTML entities do not work
fixed in tgbit/olelo/@34143fc10e3807f7a29602538973fbdb93dfdc37 & tgbit/olelo/@5c276996f582cd105f49f682cdb7c588ce5a5244
why is the html encoding wrong? Can this be fixed?
see also issue #28.
see previous comment, it works if fragments are added with Nokogiri::HTML::DocumentFragment.parse()
otherwise two latin1 characters are displayed instead of multi-byte utf8 characters
hmm this is kind of a hack
well, i don't know nokogiri that well, maybe it's a bug in nokogiri? does this happen to you as well? try some accented characters in the sidebar or preview, in the content area it works because it's already in the content when passed to nokogiri, whereas the sidebar and preview content are added later in the layout hooks
It would be nice if we could share a git repository with test pages here on github for example. On my installation everything seems to be good.
please try the xmlentries branch
seems to be a broken libxml version. please confirm:
http://github.com/minad/olelo/commit/6ee25db8d754e75de04781377d835986b54b30f8
well you still need to use that patch I posted above to get properly encoded chararcters, even with the newer libxml, I updated it to use XMLFragment: tgbit/olelo/@280ae6dba407f7aa17b1fc731092eb400089404b
html entities work now? but encoding is wrong? you are on ruby 1.8.7?
html entities has been working since the change from nokogiri::xml to nokogiri::html, but the encoding has not been working properly after that change, only with these fixes yes ruby 1.8.7
can you create a separate test case using nokogiri?
require 'nokogiri'
doc = Nokogiri::HTML::Document('<html></html>', nil, 'UTF-8')
doc.before '<tag/>'
...
something like that.
I just noticed that the sidebar is ok now without any change, only the preview has problems with encoding
i ran the following test, as you can see when DocumentFragment is not used there are two characters printed for each accented character instead of one:
code:
require 'nokogiri'
content = '<html><head></head><body><div class="content">hëlló – wörld!</div></body></html>'
preview = '<div class="preview">hëlló – wörld</div>'
doc = Nokogiri::HTML(content, nil, 'UTF-8')
doc.css('.content').after preview
doc.css('.preview').after Nokogiri::HTML::DocumentFragment.parse preview
print doc.to_xhtml
output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<div class="content">hëlló – wörld!</div>
<div class="preview">hëlló – wörld</div>
<div class="preview">hëlló – wörld</div>
</body>
</html>
I cannot reproduce. Please post this on the nokogiri issue tracker. It works on 1.8 and 1.9 for me:
<div class="content">hëlló – wörld!</div>
<div class="preview">hëlló – wörld</div>
<div class="preview">hëlló – wörld</div>
hm, interesting, so it's just me, i'll post it there then
ok i just realized that even though i upgraded libxml2 it's still not the latest, i'll try to upgrade more :)
ok, works now with 2.7.7
if there are html entities in the filter output, like – or &hallip; (although & does not seem to cause a problem) the following happens:
(I tried this with an emacs orgmode filter which generates html from org documents using emacs)