inukshuk / citeproc-ruby

A Citation Style Language (CSL) Cite Processor
101 stars 22 forks source link

HTML entities and hyperlinks in HTML output #5

Closed tnhh closed 12 years ago

tnhh commented 12 years ago

I'm trying to use citeproc-ruby to generate HTML from BibTeX. It doesn't seem to convert much to HTML, though. Given the file test.bib

@Article{dashes,
    author = {Joe Bloggs},
    title = {I like --- em --- dashes},
    journal = {Journal of 'apostrophes'},
    year = 2011,
    url = {http://www.example.com/}
}

the code

#!/usr/bin/env ruby

require 'citeproc'
require 'bibtex'

bib = BibTeX.open('test.bib', :filter => :latex)

puts "<html><body>"
puts CiteProc.process bib.to_citeproc, :style => 'apa', :format => 'html'
puts "</body></html>"

gives me

Bloggs, J., 2011. I like — em — dashes. Journal of ’apostrophes’, 3–5. Available at: http://www.example.com/.

whereas I would have expected to see something more like

Bloggs, J. (2011). I like &mdash; em &mdash; dashes. Journal of &lsquo; apostrophes &rsquo;, 3&mdash;5. Available at: <a href="http://www.example.com/">http://www.example.com/</a>.

Is there a filter that I need to pass?

inukshuk commented 12 years ago

It seems you have a unicode issue. Just for comparison:

require 'bibtex'
require 'citeproc'

bib = BibTeX.parse(DATA, :filter => :latex)
puts bib[:dashes].title

puts CiteProc.process bib.to_citeproc
puts CiteProc.process bib.to_citeproc, :format => :html

__END__
@Article{dashes,
    author = {Joe Bloggs},
    title = {I like --- em --- dashes},
    journal = {Journal of 'apostrophes'},
    year = 2011,
    url = {http://www.example.com/}
}

produces:

I like — em — dashes
Bloggs, J. (2011). I like — em — dashes. Journal of ’apostrophes’. Retrieved from http://www.example.com/.
Bloggs, J. (2011). I like — em — dashes. <i>Journal of ’apostrophes’</i>. Retrieved from http://www.example.com/.

That is to say, the LaTeX filter correctly turns --- into ; this is a unicode character and is not displayed correctly for you. The reason for this is probably that you have a locale setting that does not support unicode. Are you executing this in a unix terminal? If so, what are is the output of:

 $ echo $LC_ALL
 $ echo $LANG

Come to think of it, what Ruby version are you using?

Also, notice that there is a difference in LaTeX between right and left apostrophe (as you can see in my output the latex filter correctly printed two right quotes, because that's what you specified in your input).

If you don't want to use unicode at all but have HTML entity codes instead, you will have to write your own filter.

tnhh commented 12 years ago

Thanks for another quick reply. Unicode is part of it but I guess I just assumed HTML output when using ":format => :html". I'm afraid I haven't got time to learn ruby at the moment so I'll just stick to my working awk scripts for this project. Thanks anyway!