Closed cpence closed 12 years ago
Does this also work for non-latin characters (e.g., Greek or Cyrillic)? The citeproc test suite contains numerous test cases using non-latin alphabets, so it's really important for up/down casing to work for those. If they do work, that would be awesome because IIRC this was one of the biggest obstacles for 1.8 support.
Let me add that the current citeproc-ruby implementation is in the middle of a major re-write; if you're interested, there's already a functional version which uses the javascript engine and should work on 1.8 (in fact, it's easier to get it to run on 1.8 at the moment) – going forward the citeproc-ruby gem will become a native drop in replacement for the javascript engine.
The ActiveSupport normalizer (implementation details here) claims to support all of Unicode 6.0. It's loading its tables from unicode_tables.dat here, which is a data file generated by this script, which does seem to be getting the proper UnicodeData.txt from here (on unicode.org).
Long and short of it, yes, ActiveSupport seems to support Everything. Of course, if you're getting the fallbacks, all bets are off. (If you'd like, the implementation could be made even more robust by adding the Unicode gem as a fallback, which does run on 1.8, and the Java to_upper/lower_case methods. Let me see if I can do that and add a few commits to this pull request this afternoon.)
I've seen the rewrite on the horizon, but am pulling for citeproc-ruby, since I'd rather not spin up a JS runtime on my production machines. (I'm doing bibliographic formatting on the fly for a web application.)
That sounds great. May even be worth to add active support as a dependency straight up to get rid of the 1.8 unicode woes.
I just mentioned the js engine as a caveat, if you spend time working on the current branch, since it is pretty much not maintained. If you just want to work around a few simple problems, just let me know when you're ready and i'll push a gem release for you. If you're thinking of something more complex or are missing features, we should definitely talk to see how your work could help expedite the rewrite.
In any case, your input is much appreciated. Cheers!
----- Reply message ----- From: "Charles Pence" reply@reply.github.com Date: Tue, Jan 31, 2012 9:59 pm Subject: [citeproc-ruby] Unicode fallbacks for Ruby 1.8 (#6) To: "Sylvester Keil" sk@semicolon.at
The ActiveSupport normalizer (implementation details here) claims to support all of Unicode 6.0. It's loading its tables from unicode_tables.dat here, which is a data file generated by this script, which does seem to be getting the proper UnicodeData.txt from here (on unicode.org).
Long and short of it, yes, ActiveSupport seems to support Everything. Of course, if you're getting the fallbacks, all bets are off. (If you'd like, the implementation could be made even more robust by adding the Unicode gem as a fallback, which does run on 1.8, and the Java to_upper/lower_case methods. Let me see if I can do that and add a few commits to this pull request this afternoon.)
I've seen the rewrite on the horizon, but am pulling for citeproc-ruby, since I'd rather not spin up a JS runtime on my production machines. (I'm doing bibliographic formatting on the fly for a web application.)
Reply to this email directly or view it on GitHub: https://github.com/inukshuk/citeproc-ruby/pull/6#issuecomment-3747877
I'm certainly not missing features -- the current version does exactly what I need it to do (format my very specific, and very minimal, CSL snippets into bibliographic entries). Optimal would be to get the current version running "well enough" under Ruby 1.8, as the lack of citation formatting is the only big missing feature in my app when running under 1.8.
I'll bash on it for a little while and see if I can get anything useful together. If not, I'll certainly follow your advice and switch over to the JS edition. I'll ping this pull request again if I have a full patch for 1.8-1.9 compatibility together. (Maybe at least that would be useful for later versions -- I've already fixed, for example, the widespread use of #id and #type, which are reserved words under 1.8.)
Okay, with this set of changes, things run on Ruby 1.8 nearly as well as they do on Ruby 1.9. There's some funny issues that (I think) can be traced to the fact that Hash is unordered on 1.8, while it's ordered on 1.9 -- a few dozen new rspec failures have strange things in different orders than they are expected. There's also some silliness with DatePart on 1.8 that I'm not bothered to track down. But this code does fairly well -- only 153 spec failures, which is around 25 more than 1.9.
Of all of what I've done, here's what I think would be most important for Ruby 1.8 compatibility on your rewrite:
"\uXXXX"
. The most portable way I've found to specify Unicode string literals is with a UTF-8 sequence of bytes ("\xNN\xNN\xNN"
) followed by str.force_encoding("UTF-8")
on Ruby 1.9. There may be a better way to do this that I don't know, but the \u
escapes definitely have to be changed.nil
in Ruby 1.8, for no good reason.#id
and #type
as class methods, undefine them out of Object
on Ruby 1.8. (They were renamed to #object_id
and #class
in Ruby 1.9.)Bah. Everybody should just upgrade! =)
But since they haven't, I'd really appreciate a new gem version when you get the chance. This will make CSL-formatted citations work under Ruby 1.8 for the vast majority of use-cases.
About \uXXXX
strings; you might have noticed that we're working around this in latex-decode by converting strings to unicode like this (1.8 only):
def self.to_unicode (string)
string.gsub(/\\?u([\da-f]{4})/i) { |m| [$1.to_i(16)].pack('U') }
end
Obviously, this only makes sense if you know exactly which strings contain the codes as you probably wouldn't want to push every string through the method.
Landed in 0.0.4
UnicodeUtils doesn't build on Ruby 1.8. This change will give access to good upcase and downcase methods on ActiveSupport, as well as fallbacks.
Same sort of thing we were just hacking on over at latex-decode. This should get citeproc-ruby up and running on Ruby 1.8 -- unicode_utils won't even install, which keeps citeproc-ruby from installing.