samvera-deprecated / curation_concerns

A Hydra-based Rails Engine that extends an application, adding the ability to Create, Read, Update and Destroy (CRUD) objects (based on Hydra::Works) and providing a generator for defining object types with custom workflows, views, access controls, etc.
Other
15 stars 27 forks source link

cherry-pick size calculation change from sufia #485

Closed jcoyne closed 8 years ago

jcoyne commented 8 years ago

https://github.com/projecthydra/sufia/commit/b2625e956efd98c9183d0c7d2407615df12a6c96 and https://github.com/projecthydra/sufia/commit/04ff078b844838d602c47a71db946d5bf6a7229d

grosscol commented 8 years ago

The collections of sufia6 aggregates generic_files. Those generic_files have a property file_size which is apparently indexed into the solr document for each file. Thus the collection can just query solr for the sizes of all the contained generic_files.

To port this to CC:

  1. Make sure FileSets are indexing the file_size of 'original_file'
  2. Query for the file_size field of the FileSet members of the Work members of the Collection.

If feeling extra industrious, benchmark the result of querying by relation to get the works of the collection, or passing a list of the ids of the member works, since we should have them already?

grosscol commented 8 years ago

Starting with a collection, getting all of the FileSets contained by the Works is proving difficult to do with a single solr request.

This query will result in all the File sets.

query = ActiveFedora::SolrQueryBuilder.construct_query_for_rel(has_model: FileSet.to_class_uri)

Adding a filter to it to get only the FileSets from a particular Work is done by the filter query arguement.

args = { fq: "{!join from=#{field_name_for_member_ids}} to=id}id:#{id_of_work} }

But we'd have to run that once for each work in the collection. I can't sort out a way to get Solr to chain the results of joins together. It appears that Solr likes being used as an RDB as much as a hammer likes being used as a screw driver.