samvera / hydra-works

A ruby gem implementation of the PCDM Works domain model based on the Samvera software stack
Other
24 stars 14 forks source link

Deleting Works not removing list_source and indirect_containers from Solr #274

Open roryegerton opened 8 years ago

roryegerton commented 8 years ago

We currently have the below situation where file_sets are members of works

bw = BibliographicWork.new
bw.save

fs = BibliographicFileSet.new
fs.save

bw.ordered_members << fs
bw.save

This creates the following objects/documents in Fedora and Solr: 1 - ActiveFedora::Aggregation::ListSource 1 - ActiveFedora::Aggregation::Proxy 1 - ActiveFedora::IndirectContainer 1 - BibliographicWork 1 - BibliographicFileSet

When I destroy the FileSet, it deletes itself and its proxy from Fedora and Solr

fs.destroy

This leaves the following objects/documents in Fedora and Solr:

1 - ActiveFedora::Aggregation::ListSource 1 - ActiveFedora::IndirectContainer 1 - BibliographicWork

So far so good, this is all as expected.

However when I destroy the the Work, I am still left with the IndirectContainer and the ListSource in Solr

bw.destroy

Leaves solr documents for: 1 - ActiveFedora::Aggregation::ListSource 1 - ActiveFedora::IndirectContainer

In fedora these aren't accessible as the parent for both these objects is a tombstone

As a work around for now, to ensure that everything is deleted I can carry out the following commands to delete the indirect_container and list_source before deletion:

bw.list_source.destroy
indirect_container = ActiveFedora::IndirectContainer.where(id: "#{bw.id}/members").first #we know there is only one, otherwise we'd loop through each
indirect_container.destroy
bw.destroy

I wonder should this be done in the PCDM models instead?

tpendragon commented 8 years ago

So the problem is that the solr document is left behind, yes? Because that indirect container should be very deleted from Fedora.

roryegerton commented 8 years ago

Yes when I delete the Work. There is an IndirectContainer and a Aggregation::ListSource document still in solr

tpendragon commented 8 years ago

This is a much more generic problem than Hydra:PCDM. Effectively this is a failure in the sync of Fedora -> Solr. When I delete a root node in Fedora, it deletes all its contained resources as well. However, we don't do anything in that regard in ActiveFedora (with good reason - that'd be slooooww), so the solr document stays in place. So how do we deal with it? DO we deal with it?

Is this something we should be eating up the event stream from Fedora to do?

jcoyne commented 8 years ago

Is there any use in them going into Solr in the first place? Do these have to be AF::Base objects?

tpendragon commented 8 years ago

The list source being in solr is used for a query, I think, but maybe it doesn't have to be?