Open tpendragon opened 6 years ago
@awead and @cam156 would you mind commenting?
Maybe we could approach this in the same way we do RDF predicate mappings.
@awead Can you expand on what you mean?
@tpendragon I was thinking of architectural consistency. We can inject different kinds of mechanisms to map properties to RDF predicates. Perhaps we can use the same technique for indexing? It's similar: mapping properties to one or more solr fields.
Some discussions from tonight: Is an indexer just a "write only" persister? What's the difference between an indexer and any sort of transformation logic from the hash that is a Valkyrie::Resource
to some other representation? Is it useful to have APIs here?
I like the idea of an indexer abstraction in Valkyrie. My thoughts on this...
Having an abstraction allows Hyrax to write its indexers compliant with the Valkyrie indexer abstraction. Then Hyrax can be delivered out of the box with the Solr indexer and allow sites to switch to using an elastic search indexer just by changing the registration. Something like...
Valkyrie::IndexAdapter.register(
Valkyrie::SolrIndexAdapter,
:solr_indexer
)
Valkyrie.config.indexer_adapter = :solr_indexer
# BECOMES
Valkyrie::IndexAdapter.register(
Valkyrie::ElasticSearchIndexAdapter,
:elastic_indexer
)
Valkyrie.config.indexer_adapter = :elastic_indexer
Additional benefits of moving away from a fully functioning metadata adapter...
@tpendragon I see there hasn't been much movement on this. Are there plans to move this forward?
@elrayle Sorry, yes. We intend to do this, but it needs some research into the best interface. It looks like y'alls work on Wings is going to be a good smoke test on how useful the abstraction will be.
Thoughts and updates from initial progress on this in Hyrax / Wings.
I wrote an indexing adapter that registers the same way other adapters do. I started working on a solr indexing adapter, hoping to re-use valkyrie code directly. Initially it looked like I may be able to use the solr metadata adapter repository, but that didn't pan out because:
it required a resource factory and that required a metadata adapter, so I wondered if some of those classes could maybe be detangled a bit. Maybe this is a non-issue if we go with @tpendragon's proposal to make the metadata adapter take a "write-only" option.
in the context of an indexing adapter or write-only adapter, you don't want to return the document translated back into a valkyrie resource. and you can't actually because you won't likely be storing all that data in the index. This work happens in the repository, which you'd likely want to be able to share whether you have a separate registry or a write-only option. So it seems like we'll want to pull that transformation up a level or add in switches or something.
I am essentially pulling code out of valkyrie and simplifying it a bunch instead of actually reusing anything. Figuring out how the indexing behavior itself should work, I stripped out some logic that seems too opinionated, e.g. places where all the values are indexed by default -- these should probably be extracted into a proper indexer that you can configure to use or not use. https://github.com/samvera/valkyrie/blob/master/lib/valkyrie/persistence/solr/model_converter.rb#L63
In working on the indexer concept in Hyrax, a difference between active fedora's generate_solr_document
and valkyrie's to_solr
seems worthy of further discussion. The valkyrie solr metadata adapter builds the index by running through all the indexers and concatenating a big hash. I'm thinking of a scenario where Hyrax has defined its core indexers and someone else wants to remove one of those values, maybe because they want it in a field with a different name, or maybe they don't want it at all. I think they'd need to override the Hyrax indexer class in that case, and they'd just configure that in the initializer. That's probably fine. But the active fedora way of doing it is to build the hash through yielding a block; I think that allows just deleting a field and putting in a different field, which feels more light-weight.
@tpendragon commented: I'm not sure how I feel about mutating the hash, but there's actually a ticket in Valkyrie that proposes yielding the context: https://github.com/samvera/valkyrie/issues/599...
I'll be happy if we're able to figure out a good pattern, nobody's ever really liked to_solr
and the current indexer is basically just an OO version of that.
Is it important to be able to mutate the actual hash? @no-reply ?
Assuming that the Hyrax indexing config is roughly the same as Figgy (https://github.com/pulibrary/figgy/blob/master/config/initializers/valkyrie.rb#L224-L238), couldn't an implementation remove the indexing class they didn't want from Hyrax's config somehow?
I've become convinced that the #to_solr
method composition approach is actually quite powerful for Hyrax's use cases.
Decoupling it entirely from the model seems good. We've done that in Hyrax with a factory method Hyrax::ValkyrieIndexer.for(resource: my_resource)
. From the old ActiveFedora::IndexingService
classes, we drop #generate_solr_document
in favor of just #to_solr
. Modules can be injected with include
and prepend
, and we have a houndstooth style config-based module builder system with include/prepend Hryax::Indexer(:core_metadata)
Right now we index via a full MetadataAdapter, but it leads to bloat in the index and may result in strange inconsistencies between what you want for pulling out a Valkyrie object and what you want for search and display.
An Indexer would probably just have a pattern for building one up, some sane defaults,
save_all
, anddelete_all
.When developing the indexer's API, think of it as a write-only MetadataAdapter. This will make it easy for downstream users to transition.