inukshuk / citeproc-ruby

A Citation Style Language (CSL) Cite Processor
103 stars 23 forks source link

advice: convenience api for Renderer that takes a JSON hash #55

Open jrochkind opened 6 years ago

jrochkind commented 6 years ago

Hi @inukshuk . I'd like to PR a convenience API for rendererer, like we were talking about, that makes it easier for the use case "I have a citation, I want to format it on it's own, not as part of a bibliography".

The main fairly simple thing I think this needs, over what's there now, is a method that takes a ruby Hash with the json representation of a citation according to the csl-data.json schema.

(As an aside, I'm not sure what this Hash thing is called. I've seen it called "CSL-json", and "Citeproc-json", and then the schema has the word "data" in it suggesting "CSL Data Json". Or we could call it CSL Citation Json or CSL Reference Json, since it's a representation of a single citation/reference. But I've never seen it called that. Any idea what the current community popular thing to call this thing is?)

Anyway, adding this convenience API is fairly simple, but I can think of two ways to do it, and I'm not sure which is best.

1. Add a new method, that's like Renderer#render, but with different arguments. Not sure what to call it, maybe "render_from_hash":

      def render_from_hash(csl_json_hash, style_mode)
        citation_item = CiteProc::CitationItem.new() do |c|
          c.data = CiteProc::Item.new(csl_json_hash)
        end
        render citation_item, style_mode
      end

    renderer.render_from_hash(csl_data_json_hash,  
      CSL::Style.load("chicago-bibliography-notes").bibliography)

Downsides maybe only that it's confusing ot figure out how to name it, confusing that there are two "render" methods, adds to the public API size.

2. Extend the existing #render method so it can take either a CiteProc::CitationItem (as it assumes now), or csl-json-hash that it converts into one. Something like:

      def render(item, node)
        unless item.is_a?(CiteProc::CitationItem)
            item = CiteProc::CitationItem.new() do |c|
              c.data = CiteProc::Item.new(csl_json_hash)
            end
        end

        raise ArgumentError, "no CSL node: #{node.inspect}" unless
          node.respond_to?(:nodename)

        specialize = "render_#{node.nodename.tr('-', '_')}"

        raise ArgumentError, "#{specialize} not implemented" unless
          respond_to?(specialize, true)

        format! send(specialize, item, node), node
      end

Disadvantages might be that it makes it kind of confusing that the same method can take different kinds of arguments, makes it maybe non-obvious that passing an actual CitationItem is the preferred simplest way when it is not inconvenient, and it might be a slightly at least potential performance hit to have to check #is_a? every time you do #render

Any thoughts on which of these approaches is preferable @inukshuk ?

inukshuk commented 6 years ago

Yeah, the CSL input format is a bit of an open issue, because the CSL/CiteProc JSON input is not part of the spec; what we use today is basically the input that citeproc-js originally defined (there used to be a great manual for it as well, I can dig it up if you're curious). IIRC we decided to formalize it and add it to the spec, but this process is not complete yet I believe. Here is the draft version though (which, I hope, is supported by citeproc-ruby). The draft calls it CSL input data, so that's a good call I'd say. (The one thing I would add is that, going forward, dates can also be ISO 8601 or EDTF strings -- the next version of ISO will include EDTF.)

Regarding your question, I think I'd prefer option 2 as an outward facing API. I would not worry about the performance hit, because if you want to pass the CitationItem you may actually prefer to call render_bibliography or render_citation directly (we could consider moving the call to format! into those methods though). In other words, let's make render the easy entry point, with additional conversions, checks etc. -- more advanced users are more likely to work around it if they wish to do so.

jrochkind commented 6 years ago

Thanks! Woah, I didn't realize render_bibliography and render_citation were methods... I'll investigate those, I wonder if those should be the top-level entry points?

I think the ambiguously named "CSL input data" is actually incredibly useful and important, I hugely support the efforts to make it official. The fact that the csl-data.json schema exists in the repo (and is mentioned in the repo) already makes it seem official, that's pretty much the minimal qualification for a spec, which it already meets. I didn't realize that schema was considered a draft -- it's still hard for an outsider user (rather than participant in standardization efforts) to figure out what's up with CSL!

I'd suggest that schema quickly be considered a "1.0" rather than a draft -- many software are already using it, we need it.

inukshuk commented 6 years ago

There are actually render_* methods for every CSL rendering node; this is useful for working with / testing CSL style fragments (i.e., you can render a single number or text node).

That said, render_bibliography and render_citation are special, because they are the 'normal' entry points for cite rendering. But yeah, I think you're right, for advanced users of the renderer (such as the processor API) those could be used directly. For the easy-to-use API I think render makes the most sense. I.e., you load your style, you fetch your items, you pass them to render with style.bibliography or style.citation, depending on what you want to render.