inukshuk / citeproc-ruby

A Citation Style Language (CSL) Cite Processor
103 stars 23 forks source link

Renderer allows re-use of a passed in Locale object instead of reload #54

Closed jrochkind closed 6 years ago

jrochkind commented 6 years ago

Ref #53

CiteProc::Ruby::Engine class already does it here for styles: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/engine.rb#L117-L121

And CiteProc::Ruby::Format does it here for formats: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/format.rb#L31

Loading (or not) of all three types of objects is done in kind of different non-parallel ways, but that was kind of there in the code when I found it.

With this change for locale, I can avoid loading both locales AND styles (as well as formats while we’re at it although it probably doesn’t make as much of a difference) with EITHER of these APIs:

csl_style # assume exists a CSL::Style, prob previously loaded with CSL::Style.load
csl_locale # assume exists a CSL::Style, prob previously loaded with CSL::Locale.load

    cp = CiteProc::Processor.new style: csl_style,
       locale: csl_locale,
       format: CiteProc::Ruby::Formats::Html.new
    cp.import csl
    cp.render :bibliography, id: whatever

   csl_hash # assume exists, a single hash (not array) that’s csl-data.json format
   citation_item = CiteProc::CitationItem.new(id: csl_hash[“id"]) do |c|
      c.data = CiteProc::Item.new(csl)
    end

      renderer = CiteProc::Ruby::Renderer.new :format => CiteProc::Ruby::Formats::Html.new,
         :style => csl_style, :locale => csl_locale
      renderer.render citation_item, csl_style.bibliography

The latter is giving me pretty good performance, 7-9ms on my macbook, with already loaded style/locale.

The former still gives better performance than it did before an already loaded locale could be used.

There’s other code here: https://github.com/inukshuk/citeproc-ruby/blob/c6b81ea04868ad2afc704d9a38148df053cf4898/lib/citeproc/ruby/renderer/locale.rb#L11

…that at first I was looking at to avoid the load if it already has a locale…. I’m not totally sure when that code is used vs the code I did patch, but the thing I’m PR’ing here is the thing that seemed to work for both CiteProc::Processor use and CiteProc::Renderer use. I think.

None of these approaches were obvious from the docs. Perhaps a README PR giving examples of how to do this now with current code? Better convenience API for this use case might also be nice, but I’m not totally sure how to do it sensibly. Maybe just a new method on Renderer that lets you just pass in a csl_hash, without having to make the CitationItem yourself, and lets you pass in :bibliography if you want. (And why do I have to tell CitationItem the id, when that’s already in the hash?)

Or maybe the convenience API should actually be in terms of Engine, create an engine and just pass that in to render each time. Which maybe is possible now. I gotta figure out how to work with Engine directly. Is Engine thread-safe do you think? If not, then nevermind on this last paragraph.

jrochkind commented 6 years ago

I don't know why this is failing travis only in 2.3. I don't think it's related to my change. @inukshuk ?

Also, do you think CiteProc::Ruby::Engine is thread-safe? Whether it is or not depends on how I may approach my attempt a convenience API.

jrochkind commented 6 years ago

Also note I didn't ADD any tests for this; I couldn't figure out what/how to add a test. The cucumber stuff confuses me, i'm not good with cucumber.

inukshuk commented 6 years ago

The errors on 2.3 are definitely unrelated, I'll look into it.

The Engine class is still a remnant from having the two renderer engines (js and Ruby), it's something I'd like to get rid of when we finalize the processor API (i.e., it's better to think in terms of Processor and Renderer, with Engine concerns moving to Processor). Processor/Engine carry a lot of state (all cited items, their order, disambiguations, abbreviations, etc.) so they'll require more thought/analysis in regards to thread safety. So I'd concentrate on using the Renderer directly. I hope this will make the use case easier to explain, too. As in: If you just want to render one-off references, you don't need the entire processor API, but only the renderer, a style, and your item (and locale and format).

The processor API is inherently more complex: you import your items, you start citing items, you generate the bibliography, cite more items, the bibliography changes etc. So for a simple API, I think it totally makes sense to expose the renderer and make it explicit that the renderer is a simpler construct than the whole processor, albeit one that is sufficient for many use cases.

jrochkind commented 6 years ago

Thanks!

I see now that the 'engine' has a reference to a 'renderer', so if the 'renderer' isn't concurrency-safe, the engine can't be either!

I think I have a good idea for a convenience api that doesn't take much code, I'll try to PR it, maybe along with some README modifications (let me know if you want the README modifications in a different PR, I might suggest reorganizing some of what is there). Thanks for this code!

inukshuk commented 6 years ago

Any PRs welcome (organize them as you like). I'm also happy to give write access to anyone who contributes significantly. (The sane thing to do, actually, would be to make the renderer-only API the default, at least while the full processor API is not complete)

jrochkind commented 6 years ago

I had no idea the full processor API was considered not complete! it wouldn't hurt to have some notes in the README and what parts you the developer consider need more work in what ways. But I don't have the context for that.

What is the minimum level of ruby you support? The README says 1.9+, but I see travis is only testing on 2.3, 2.4, and 2.5. (Indeed rubies before that are out of support by ruby team or soon will be). (Plus travis is testing jruby in 1.9 mode -- any jruby that takes a 1.9 mode argument is also no longer supported by team jruby, it's now only JRuby 9.0.0.0 and subsequent).

Mainly I want to know if i can use ruby keyword arguments, introduced in 2.0, which make some things a lot easier and more reliable.

inukshuk commented 6 years ago

That's what I kept trying to say : ) Use the renderer: the renderer implements all CSL rendering nodes and is pretty much complete. Many of the advanced 'processor' features, many of which are not part of the CSL spec itself and which, I imagine, many apps would actually want to control themselves, are not. It's certainly true that the library could do with some documentation to make it more approachable; in fact, it may make sense to ditch the idea of implementing all the processor features and just provide a good interface to the renderer. The simple truth is that I only ever needed the CSL renderer myself and the README is an attempt at a minimal example to give you some idea what can be done -- I just simply ran out of steam there!

The library really started out with Ruby 1.8; since Unicode and Regexp support is pretty crucial, we had an evolving compatibility layer using various ways to support the different versions. At some point I stopped testing Rubies which reached end of life status on Travis CI, but did not consequently remove old hacks and workarounds. So this probably still works on 1.9 if you manage to install all the old Gems, but today I'd say that it really support 2.3+ (because that's what we test) -- or that's what I'm willing to maintain, anyway. (In other words, keyword arguments are fine if you ask me)

jrochkind commented 6 years ago

OK, thanks! Is there anyone to ask other than you? You're running this show right? :) So I'll use keyword args if convenient in my next PR, and if so also update the README to say ruby 2.0+ in my README PR. I like writing docs, so I'll write what I think is a better README based on what I've learned so far (and what I PR).

inukshuk commented 6 years ago

Sounds good! The csl Gem is used to run the official CSL tests, so there are more stakeholders; and for general CSL/CiteProc concerns the CSL maintainers, citeproc-js and Zotero developers are also involved -- but as far as the Ruby API goes I think it makes sense to do whatever is best for a typical Ruby developer; in other words, your opinion is as good as mine!