projectblacklight / blacklight

Blacklight provides a discovery interface for any Solr (http://lucene.apache.org/solr) index.
http://projectblacklight.org/
Other
758 stars 257 forks source link

Blacklight::Solr::Document extensions #236

Closed MrDys closed 12 years ago

MrDys commented 12 years ago

CODEBASE-208: After thinking about it for a while and prior discussions with Matt Mitchel et al, I think it's actually pretty straightforward to implement.

A Document 'extension' will simply be a ruby module providing additional behavior for a Document in the form of methods.

However, this module will not be 'include'd in B::S::Document at the class-level, because the point of these extensions is that their behavior applies only to SOME documents, for instance only 'marc' documents. So the extension Modules will be applied to individual B::S::Document instances, in the Document constructor, using ruby 'extend', and using logic supplied when the extension is 'registered'. For instance, for MarcExtension, in an initializer or other startup code (as extensions can be bundled with BL, can be purely local, or can come from an extra-Blacklight plugin).

Blacklight::Solr::Document.use_extension( MarcExtension ) { | document | document.key?( marc_source_field ) }

So MarcExtension will only be 'extend'ed onto Document if the block returns true, for a given document.

The main use case for these extensions at present is supplying conversions of a B::S::Document to another format. By convention, methods named "to__" will be used to convert to a ruby object of some type, and "export_as__" will be used to convert to a String suitable for delivery to an external client.

So for Marc, we will have: someDocument.to_marc => returns a ruby_marc object someDocument.export_as_marc21 => marc21 in a String someDocument.export_as_marcxml => marcXML in a String.

Now client code, if it wants to know if Marc representation of a given B::S::Document is available, can just call: someDocument.respond_to?(:to_marc) and... someDocument.to_marc => to actually get it.

No more of this "someDocument.marc.marc" business, because the client code should not have to care about what code the #to_marc came from (maybe the original indexed doc wasn't even marc, but was something that can be converted to marc, and a to_marc is provided to do that), it just cares if a ruby_marc representation is available.

There's one more piece. We want the Blacklight core code (controllers and such) to be able to know which "export_as" formats are available for a particular document, so that the controller can respond with the proper format ( /catalog/show/id.marcxml, /catalog/show/id.refworks, etc).

In order to do this, an individual B::S::Document simply needs to have a list of all the exportas* it supports. (Some may be built-in generically to any B::S::Document based only on stored fields in Solr, of course).

An individual extension can over-ride it's "extended" method to tell the B::S::D what export_as it supports....

module MarcExtension [...]

def self.extended( document ) document.will_export_as( :marcxml, :marc21, :something_screwy ) end

def export_as_marcxml.... def export_as_marc21.... def export_as_something_screwy end

Now the catalogcontroller show action can do something like this:

respond_to do |format| @document.will_export_as.each do | can_export_as | format.send(can_export_as) { @document.send("export_as" + can_export_as) } end

And by magic, any format added by any extension is available from show/id.format . Similarly, any export_as format supported by a given document can be advertised in HTML , or in Atom feed using the proper element for advertising alternate formats, or in UnAPI, etc.

MrDys commented 12 years ago

Original reporter: jrochkind

MrDys commented 12 years ago

jrochkind:

The work to set up the basic support framework is 90% done in: http://github.com/jrochkind/blacklight/tree/document-extension

I have not yet started looking at refactoring the Marc extension to use this, so let's use an example of a hypothetical WeirdExtension, that, among possibly other things, adds the ability to convert a SolrDocument to application/weird. Here is how you would do this under the extensions stuff I've got in the branch.

In an init.rb or initializer, you've got register application/weird, no way around it I could find.

Mime::Type.register("application/weird", :weird)

Now your extension module, defined anywhere -- bundled with blacklight, in your local app, in an extra-blacklight plugin, whatever:

module WeirdExtension def self.extended(document) document.will_export_as(:weird) end

def export_as_weird
     #Some code to translate to an application/weird object, and...
    return weird_obj
end

end

Now in some initializer or other startup code:

SolrDocument.register_extension( WeirdExtension ) { |document| my_code_to_determine_weird_applicable?( document ) }

( If you somehow use a model other than SolrDocument, that's fine, this will work on anything that includes Blacklight::Solr::Document, as SolrDocument does).

That is literally it. WeirdExtension will be added to documents your method determiend were applicable. For those documents and those documents only:

It takes less code than you might think to accomplish this, it's all pretty elegant. (And fully test-covered. Or nearly. I said 90% done after all).

This same pattern will make it pretty easy to add:

MrDys commented 12 years ago

jrochkind: Added a class-level extension_parameters hash on any class that includes Blacklight::Solr::Document.

class SolrDocument include Blacklight::Solr::Document

[...] end

SolrDocument.extension_parameters[:marc_format_type] = :marc21 SolrDocument.extension_parameters[:marc_source_field] = "marc"

The use case is clear from this example: A more general purpose and light weight device to store parameters like were previously being stored in class methods dynamically added to the class. Now the module (such as a Marc module) can simply look at self.class.extension_parameters[:key] for whatever class-level parameters it needs.

MrDys commented 12 years ago

jrochkind: This is pretty much done except for writing better in-class documentation. And thinking through if we really want the extension stuff to automatically add formats to Rails Mime::Type if needed, or instead it should just fail with an exception if the relevant format is not already registered. Either way, adding formats yourself in an initializer is recommended for things to work sensibly, no good way to have the extension framework automatically do this in a particularly sensible way.

MrDys commented 12 years ago

jrochkind: Extensive rdoc comments written.

The basic architecture for the extension and export framework is done and (i believe) ready to be committed to trunk, in: http://github.com/jrochkind/blacklight/tree/document-extension

I am working on the refactoring of the Marc stuff to use the new architecture in a seperate branch that can be considered and applied after applying the basic architecture. http://github.com/jrochkind/blacklight/tree/document-extension-with-marc

MrDys commented 12 years ago

jrochkind: I'd add thinking about it more that part of the difference between here and marc-pluggable is that Blacklight document extensions are likely to come with a bundle of export formats.

We have a Marc extension that might come with special exports not only to Marc21 and MarcXML, but exports based on marc to RIS, OpenURL, RDF, a few more things.

Then we'd have an EAD extension that comes with special exports based on EAD to, not only EAD itself, but, say, OAI-DC, RIS, OpenURL, etc.

So I think each extension is likely to come with a bundle of formats, but you want to install it as a package, to get "Bess's EAD stuff" or something. That might have motivated the silghtly different design than you used with ruby-marc?

It's also set up to support individual format type over-riding though. Let's say you install Bess's EAD extension, but you don't like the way it implements EAD-to-RIS. No problem, supply your own EadToRis extension, register it after BessEad, and it's own export_as_ris will end up being "on top".

Make some sense you think? I think it handles what you're talking about in a pretty clean way. And now I'll shut up. For now. :)

MrDys commented 12 years ago

billdueber: Jonathan, instead of a bunch of export_as_blah_blah methods, wouldn't it make more sense to allow additions to SolrDocument to register one or more export types and then have the call be export_as(:type)? A Document instance could trivially report what export options are installed, and writers could be registered in a plug-in sort of manner (kind of like what I'm doing with ruby-marc-plugable -- see http://github.com/billdueber/ruby-marc-plugable).

MrDys commented 12 years ago

jrochkind: Okay, in addition to the branch establishing the basic Document Extension architecture, I now have a branch which has refactored all Marc-specific code in Blacklight to USE the new architecture.

The idea is that FIRST we evaluate and commit the branch simply establishing the architecture: http://github.com/jrochkind/blacklight/tree/document-extension

This simple establishing-the-architecture branch is VERY unlikely to cause anyone any backwards compatibility problems. ALL existing tests in BL still pass, unchanged, as a result of this architecture. [Of course there are new tests for the architecture itself].

SECOND, once we've committed the basic architecture branch, we evaluate and hopefully eventually commit the branch refactoring MARC functionality to use this archicture: http://github.com/jrochkind/blacklight/tree/document-extension-with-marc

While I tried to make the marc-refactoring as backwards compatible as possible, leaving in "deprecated" old-style methods, it did involve some significant changes, so will need more careful evaluation.

OVERVIEW OF MARC REFACTORING, AND CHANGES In the marc refactored to use new document extension architecture branch: http://github.com/jrochkind/blacklight/tree/document-extension-with-marc

A Marc Document Extension is provided at: Blacklight::Solr::Document::Marc ( lib/blacklight/solr/document/marc. An argument could be made that these should live in app/model instead, but I have not changed this for now, this is where this stuff has been).

This extension will be applied only to documents that actually have marc. It will add a #to_marc method to those Blacklight::Solr::Document objects (such as SolrDocument, our BL Blacklight::Solr::Document implementing class -- this is confusing, but this is not a change, this SolrDocument and Blacklight::Solr::Document stuff!).

It will also add various conversion methods: #export_as_marc (marc binary), #export_as_marcxml, #export_as_endnote, #export_as_refworks_marc_txt, and a few more.

Through the magic of the Document Extension Architecture, as a result of these export_as being provided and registered, the catalog/show controller will automatically provide these export formats by content-negotiation and format suffix.

The new way to register this Marc extension is, in your blacklight_config.rb (or any other initializer):

SolrDocument.extension_parameters[:marc_source_field] = :marc_display SolrDocument.extension_parameters[:marc_format_type] = :marc21 SolrDocument.use_extension( Blacklight::Solr::Document::Marc) do |document| document.key?( :marc_display ) end

The old way of doing it will still work FOR NOW through some hacky code, but with an ugly deprecation warning:

DEPRECATED

SolrDocument.marc_source_field = :marc_display SolrDocument.marc_format_type = :marc21

CHANGES TO FORMAT NAMES AND FACTORINGS

Previously, we had a format called ":refworks". This was confusing because Refworks standard format is "Refworks Tagged Format", and what we were returning was NOT that. What we were/are returning is a weird proprietary Refworks format for expressing Marc in plain text. It doesn't seem to have an official name, so I called it :refworks_marc_txt. [The built-in tools for directing to RefWorks Export have been changed to use it. ANY future document extension that provides and registers export_as_refworks_marc_txt will be used by the export to Refworks button -- that's what the new Document Extension Architecture does! ]

Previously, the (now deprecated) Marc helper object ( Blacklight::Marc::Citation) offered a #to_zotero method. This was confusing: What this method was REALLY creating was a COinS (Content Object in Spans) object, which is used by zotero (at least until we provide better methods for Zotero), but is not unique to Zotero. A COinS is an OpenURL Context Object in kev format, embedded in a object. I decided the proper place for generating the span object itself is in View code. But the Document should generate the context object as kev. So the MarcExtension offers an export_as_context_object_kev method, and the BL view code actually embeds it in a span -- and will generate a span like that for ANY Document that offers an export_to_context_object_kev.

REMOVED CODE

MOST of the code from the previous way of doing these things has been left there (deprecated), but a couple methods were outright removed. ApplicationHelper#render_to_refworks_txt and ApplicationHelper#render_to_endnote_txt. These methods are gone now in this branch; this functionality is handled by the Marc Document Extension (and similarly can be by any future extensions).

MIGRATION AND BACKWARDS COMPATIBILITY

I went through extra effort and time to try to make this as backwards compatible as possible. Calling document.marc to get a (now deprecated) Blacklight::Marc::Document Marc Helper object IS still supported -- and you get that marc helper object unchanged, if it worked before, it will still work. You can call document.marc.marc as before. You do get nil from document.marc on a Document with no marc in it.

However, NO code currently in Blacklight core does this anymore. (Except for a couple tests on the deprecated code, naturally). It is deprecated, and local code that uses it needs to be changed, it will go away eventually.

You CAN still call: SolrDocument.marc_source_field = :marc_display SolrDocument.marc_format_type = :marc21

in blacklight_config. Calling BOTH these methods will have the effect of registering thew new Blacklight::Solr::Document::Marc extension object with the neccesary parameters. This is DEPRECATED though, and should be changed in local apps, it won't work forever.

So i've tried to be backwards compatible (with deprecation). However, SOME blacklight code HAS changed. Since almost ANY Blacklight code is subject to being monkey-patched or relied upon by local customization, some local customizations MAY break.

All Blacklight core view templates have been changed to use the new architecture. However, if you've copied-and-pasted and 'frozen' Blacklight core views in the past, and they're still using the old architecture -- you will have to fix them. They will PROBABLY keep working using deprecated API (until it is removed), but in some cases may not.

DEPRECATION WARNINGS: HOW SHOULD IT BE DONE?

This is one question still out there I need feedback on. What is the best way to warn people about their use of deprecated methods?

Right now, calling a deprecated method simply writes out with Kernel.warn a method (but no stack trace). It will do this in any environment, and is not configurable.

Do we need more fine-grained power over deprecation warnings? Or should we not worry about it, thinking people will just fix their apps to not use deprecated API quickly instead? I was unable to find any existing gem or Rails feature that supported deprecation warnings in the way we're likely to want (I did evaluate several).

But if we're clear about what we want, I can write a little utility method that just does it. Should it print out a stack trace? Should it do different things in development vs production vs test? Does it need to be configurable so everyone can make it work different? (I guess they could do that just by monkey patching the hypothetical Blacklight.deprecation_warning, unless we need it to be REALLY EASY to customizize; I doubt we do). Etc. Tell me what you want in deprecation warnings. We need something, because these deprecated methods DO need to go away (and sooner rather than later), to avoid code bloat.

MrDys commented 12 years ago

jrochkind: Possibly Bill. But the extension still needs to provide the logic SOMEWHERE, right? And it needs to provide it in an individual method so another plugin can over-ride just THAT export type.

But I'm already half-way with you. There actually IS an export_as(:type) method in there already. The default implementation just calls .send("export_as#{type.to_s}"). But that's already there. And there are in fact comments to the effect that clients should preferably call export_as(:type) -- so it's kind of paving the way to maybe eventually change it up.

But my thinking was that the actual logic has to be provided SOMEWHERE. And the logic for each type needs to be provided seperately both for cleanliness of code (and testing), and so extensions can easily over-ride just exactly the types they want to over-ride.

In the current design a Document instance can trivially export what options are installed: Ask the document document.export_formats, and you get the report (a hash with key of format short name, and value of mime-type).

And writers CAN be registered in a plug-in sort of manner, that's indeed the whole point of the architecture! Here is how you write and register a plugin:

module MyPlugin self.extended(document) document.will_export_as(:type, "application/type") end def export_as_weird_type "weird_type" end end

Register in a plug-in sort of manner? No problem: SolrDocument.use_extension(MyPlugin). # The end

So I think I am in fact accomplishing what you want to accomplish, but maybe in a slightly different manner. What do you think?