Closed etc closed 13 years ago
It would also be nice to properly convert ` and '' to “ and ”, and likewise for single quotes. Apologies if any of this is outside the scope of what
latex-decodeis intended to do; I didn't know if you wanted it to be as general purpose as, for example, parsing
\emph{}markup. (In case you didn't know about
\,, it is the code for thin space, Unicode
U+2009`).
I suspect that this is caused by improper handling of backslashes somewhere in the parsing process. I'll try to fix it ASAP; meanwhile, it would be great if you could convert the examples to proper cucumber features – that way we can make sure there will be no regressions for this issue.
Originally, the scope of latex-decode
was to include all possible conversions of LaTeX directives that can be represented by Unicode; for that reason, \emph
is problematic, because we'd have to decide which markup to convert to.
By the way, fascinating stuff, your bibliography ;-)
I am happy to write some associated cucumber features—but will probably not get to it for a few days. (All the samples I'm using are taken from https://github.com/etc/philosophy-bibliography)
Brad!
Thanks for the test cases and my apologies for not having been able to look at this sooner.
At first I suspected this to be an issue with string escape sequences in the parser (those can be quite annoying to track down), but as it turns out, the issue was that names needed a little extra treatment, because regular values consist of tokens, but names consist of name tokens and each name token, again, consists of the individual name parts.
Additionally, you were using a number of conversions latex-decode wasn't aware of. I've added most of them, but I feel a little bit uneasy about the quotation mark replacements. Would you expect every single ' to be converted (as your test cases seem to suggest)? I have implemented it that way for now, but I wonder if this is indeed the right approach.
I have yet to add support for the thin space. Meanwhile, could you take a look if the solution works for you? Your features should all pass, except for one: is it Äsa Maria Wikforss or Åsa Maria Wikforss? I suspect the latter, but I thought I'll better ask you, as you may want to fix that in the bibliography.
To test you will need to issue
$ [sudo] bundle install
in order to fetch the latest latex-decode.
Brad,
I've added \,
support to latex-decode, but I'll wait with a new gem release, in case there are any other symbols, diacritics etc., which you need but which are not supported yet.
About the \emph
issue: we could include a LaTeX to HTML (or something else) converter to BibTeX Ruby proper; I imagine this may be quite useful if you are processing the entries directly (plus, citeproc and citeproc-js generally try to handle HTML tags gracefully, too). What do you say?
Just pushed 2.0.1; when using latex-decode 0.0.7 the \,
conversion should be supported, too. I'll close the issue for now, please reopen if I've missed something.
On a different note: is there any bibtex or cite-formatting related support you need with maldini? I've been meaning to take a look at that, as I'll need to integrate academic citations with a website later this year.
\c c
was not mentioned, it does not work for me.
Alex, the c-cedilla should be supported by latex-decode. This works for me:
$ gem i latex-decode
Successfully installed latex-decode-0.0.7
1 gem installed
Installing ri documentation for latex-decode-0.0.7...
Installing RDoc documentation for latex-decode-0.0.7...
$ irb -r "latex/decode"
001:0> LaTeX.decode '\c{c}'
=> "ç"
002:0> LaTeX.decode '\c{C}'
=> "Ç"
I was using Ruby 1.9.2 but it is supposed to work on other versions as well. Please report back if this example does not work.
Thanks! I'll use latex-decode then. I just thought that convert(:latex)
should have taken care of it.
It does not accept though the syntax without curly brackets:
pry(main)> LaTeX.decode '\c{C}'
=> "Ç"
pry(main)> LaTeX.decode '\c C'
=> "\\c C"
I have those in my bibliography.
The Latex filter uses the latex-decode gem, so whatever works there, should work in bibtex-ruby, too.
If the syntax without curly braces is supported by latex, we can add it to latex-decode (it is very easy to add conversions there).
----- Reply message ----- From: "Alexey" reply@reply.github.com Date: Mon, Oct 24, 2011 7:43 pm Subject: [bibtex-ruby] Latex filter issues (#28) To: "Sylvester Keil" sk@semicolon.at
It does not accept though the syntax without curly brackets:
pry(main)> LaTeX.decode '\c{C}'
=> "Ç"
pry(main)> LaTeX.decode '\c C'
=> "\\c C"
I have those in my bibliography.
Reply to this email directly or view it on GitHub: https://github.com/inukshuk/bibtex-ruby/issues/28#issuecomment-2506687
I will look into it and try to give a minimal example. It seems that in bibliography i have Author = {... Fran{\c c}ois ...}
, but after convert(:latex)
it becomes Fran\c cois
. Both are acceptable in LaTeX, but i think that in BibTeX bibliography fields it is common, if not necessary, to surround by curly brackets.
Here is how LaTeX.decode
behaves with '{\c c}'
: instead of replacing it with 'ç'
, it simply strips the curved brackets:
pry(main)> puts LaTeX.decode('Fran{\c c}ois')
Fran\c cois
Can you point to any official LaTeX documentation that describes this syntax you are using? If I run this through xelatex I get a 'control sequence undefined' error.
If this is no valid syntax we can't add it to latex-decode, however, you can easily write your own filter: just pass an object that responds to :apply
to #convert
or, alternatively, write a class that inherits from BibTeX::Filter
and implements #apply
– then you can call your filter by name:
class MyFilter < BibTeX::Filter
def apply(input)
input.gsub(/\{\\c c\}/, 'ç')
end
end
Now you can use your filter on a Bibliography object with bib.convert_myfilter
or bib.convert(:myfilter)
. Alternatively, as I said above, you can pass in any object that responds to :apply
; take a look at this test case for an example.
Anyway, please let me know if the {\c c}
syntax is indeed valid LaTeX I'll add it to latex-decode asap.
Thanks for the explanation and suggested workarounds. I will try to see if i can find some "official" documentation. However, this syntax is valid to the best of my knowledge, it is a basic TeX syntax, which is supported in LaTeX. There are commands which accept a single token as argument, and this token is just the first token or the first group inside {``}
than follow the command, possibly after a sequence of spaces. This is what i vaguely remember from Knuth's TeXbook. The following example works for me in tex-live LaTeX:
\documentclass{article}
\begin{document}
\c c
\c {abc}
\' c
\end{document}
The bibliography with which i am dealing was downloaded from HAL archive. It contains {\c c}
. As far as i understand, curly brackets in BibTeX fields have additional meaning, besides the usual one --- separating a group (like comments in config files that actually contain metadata to be processed): they tell BibTeX to not post-process what is inside (to not change the case in particular).
I'll look for something official.
I just checked one more time using latex and you're absolutely right, everything works. We'll add support for this to latex-decode.
Thank you. I think the best way would be to implement the standard TeX parsing rules. LaTeX documentation usually suggests using brackets, but they are not needed for parsing.
Alexey, I pushed latex-decode 0.0.8 which hopefully solves the problem:
mbp:latex-decode$ irb -r 'latex/decode'
001:0> LaTeX.decode '\c C'
=> "Ç"
002:0> LaTeX.decode '\c {cbc}'
=> "çbc"
003:0> LaTeX.decode '\c cab'
=> "çab"
004:0> LaTeX.decode '{\c c}'
=> "ç"
As I don't currently require the LaTeX filter myself, I haven't done any extensive testing; fingers crossed that this doesn't break any other conversions. If you have any other issues which pertain strictly to latex decoding, please post them over at the project repository.
Thanks for reporting this issue!
Thank you. However, i suggest that latex-decode better ignore '\c {cbc}'
altogether, as the output you show does not resemble my pdfLaTeX's output :).
You're right, that doesn't make sense – we can't actually reproduce LaTeX's behaviour in that instance using unicode, I don't think. ;-)
Using
convert(:latex)
on a bibliography currently has mixed success. In particular, the following LaTeX commands are not always correctly filtered:I am puzzled by this, as for some of these commands,
latex-decode
by itself appears to give the correct results. For example:The following transcript gives an example that exhibits all of these problems (it involves one call to
BibTeX.open
and then one call toBibTeX.convert
; a search for "\" will show where the problems lie):