Consider adding a "Write-Only" Adapter

tpendragon commented 6 years ago

Right now we index via a full MetadataAdapter, but it leads to bloat in the index and may result in strange inconsistencies between what you want for pulling out a Valkyrie object and what you want for search and display.

An Indexer would probably just have a pattern for building one up, some sane defaults, save_all, and delete_all.

When developing the indexer's API, think of it as a write-only MetadataAdapter. This will make it easy for downstream users to transition.

tpendragon commented 6 years ago

@awead and @cam156 would you mind commenting?

awead commented 6 years ago

Maybe we could approach this in the same way we do RDF predicate mappings.

tpendragon commented 6 years ago

@awead Can you expand on what you mean?

awead commented 6 years ago

@tpendragon I was thinking of architectural consistency. We can inject different kinds of mechanisms to map properties to RDF predicates. Perhaps we can use the same technique for indexing? It's similar: mapping properties to one or more solr fields.

tpendragon commented 6 years ago

Some discussions from tonight: Is an indexer just a "write only" persister? What's the difference between an indexer and any sort of transformation logic from the hash that is a Valkyrie::Resource to some other representation? Is it useful to have APIs here?

elrayle commented 5 years ago

I like the idea of an indexer abstraction in Valkyrie. My thoughts on this...

Having an abstraction allows Hyrax to write its indexers compliant with the Valkyrie indexer abstraction. Then Hyrax can be delivered out of the box with the Solr indexer and allow sites to switch to using an elastic search indexer just by changing the registration. Something like...

  Valkyrie::IndexAdapter.register(
    Valkyrie::SolrIndexAdapter,
    :solr_indexer
  )
  Valkyrie.config.indexer_adapter = :solr_indexer

# BECOMES

  Valkyrie::IndexAdapter.register(
    Valkyrie::ElasticSearchIndexAdapter,
    :elastic_indexer
  )
  Valkyrie.config.indexer_adapter = :elastic_indexer

Additional benefits of moving away from a fully functioning metadata adapter...

removes solr doc bloat as it removes the requirement that all attributes be in the solr doc to allow reconstruction of the resource from the solr doc. Solr doc bloat has the potential to effect search results.
provides more flexibility in the handling of fields (e.g. dates can be in the solr doc as dates optimized for range searching as a date facet)

elrayle commented 5 years ago

@tpendragon I see there hasn't been much movement on this. Are there plans to move this forward?

tpendragon commented 5 years ago

@elrayle Sorry, yes. We intend to do this, but it needs some research into the best interface. It looks like y'alls work on Wings is going to be a good smoke test on how useful the abstraction will be.

hackartisan commented 5 years ago

Thoughts and updates from initial progress on this in Hyrax / Wings.

I wrote an indexing adapter that registers the same way other adapters do. I started working on a solr indexing adapter, hoping to re-use valkyrie code directly. Initially it looked like I may be able to use the solr metadata adapter repository, but that didn't pan out because:

it required a resource factory and that required a metadata adapter, so I wondered if some of those classes could maybe be detangled a bit. Maybe this is a non-issue if we go with @tpendragon's proposal to make the metadata adapter take a "write-only" option.
in the context of an indexing adapter or write-only adapter, you don't want to return the document translated back into a valkyrie resource. and you can't actually because you won't likely be storing all that data in the index. This work happens in the repository, which you'd likely want to be able to share whether you have a separate registry or a write-only option. So it seems like we'll want to pull that transformation up a level or add in switches or something.

I am essentially pulling code out of valkyrie and simplifying it a bunch instead of actually reusing anything. Figuring out how the indexing behavior itself should work, I stripped out some logic that seems too opinionated, e.g. places where all the values are indexed by default -- these should probably be extracted into a proper indexer that you can configure to use or not use. https://github.com/samvera/valkyrie/blob/master/lib/valkyrie/persistence/solr/model_converter.rb#L63

hackartisan commented 5 years ago

In working on the indexer concept in Hyrax, a difference between active fedora's generate_solr_document and valkyrie's to_solr seems worthy of further discussion. The valkyrie solr metadata adapter builds the index by running through all the indexers and concatenating a big hash. I'm thinking of a scenario where Hyrax has defined its core indexers and someone else wants to remove one of those values, maybe because they want it in a field with a different name, or maybe they don't want it at all. I think they'd need to override the Hyrax indexer class in that case, and they'd just configure that in the initializer. That's probably fine. But the active fedora way of doing it is to build the hash through yielding a block; I think that allows just deleting a field and putting in a different field, which feels more light-weight.

@tpendragon commented: I'm not sure how I feel about mutating the hash, but there's actually a ticket in Valkyrie that proposes yielding the context: https://github.com/samvera/valkyrie/issues/599... I'll be happy if we're able to figure out a good pattern, nobody's ever really liked to_solr and the current indexer is basically just an OO version of that.

Is it important to be able to mutate the actual hash? @no-reply ?

escowles commented 5 years ago

Assuming that the Hyrax indexing config is roughly the same as Figgy (https://github.com/pulibrary/figgy/blob/master/config/initializers/valkyrie.rb#L224-L238), couldn't an implementation remove the indexing class they didn't want from Hyrax's config somehow?

no-reply commented 4 years ago

I've become convinced that the #to_solr method composition approach is actually quite powerful for Hyrax's use cases.

Decoupling it entirely from the model seems good. We've done that in Hyrax with a factory method Hyrax::ValkyrieIndexer.for(resource: my_resource). From the old ActiveFedora::IndexingService classes, we drop #generate_solr_document in favor of just #to_solr. Modules can be injected with include and prepend, and we have a houndstooth style config-based module builder system with include/prepend Hryax::Indexer(:core_metadata)

samvera / valkyrie

Consider adding a "Write-Only" Adapter #598