Open Natkeeran opened 5 years ago
@Natkeeran can you describe how you would expect these aspects of a repository to be visualized on a chart? Or were you thinking that a table would be the best way to present this information?
@mjordan Simple tabular report would be good. If the report can give a snapshot about repository, that would be very helpful from system admin, repository/collection admins, decision makers etc.
number of namespaces
conceptual and real objects by namespace
number of collections
storage taken up (per collection / namespace)
number of conceptual and actual objects per collections
who created it (conceptual object)
when was it created (conceptual object)
who edited last (conceptual object)
last modified date (conceptual object)
AIP generated (conceptual object)
AIP generation date( conceptual object)
number of files and their storage locations per collection/namespace/repository
file format/mime type per collection/namespace/repository
checksum for media
last time checksum was run / any issues
In 7.x we use a solr export and python script to generate this info: https://github.com/digitalutsc/islandora_reports/blob/master/islandora_repo_reports.ipynb
@Natkeeran you will have noticed that I transferred this issue from the original (now deprecated) incarnation of this module, Preservation Reports. I apologize if my response it a bit out of context.
While I agree that "simple tabular reports would be good," I think putting all of this information in one table stretches "simple" pretty far, at least from the perspective of gathering all the required data at scale. That said, I think most of the items in your list could be fairly easy to query for (except for namespaces, since I'm not sure what that refers to in the Drupal 8 context). The data for others, like "AIP generated" and "AIP generated date" are a bit more challenging to pull since there is currently no standard source for this info. Islandora Bagger could be made to record when a Bag was generated, and the Repository Reports module can pull data from external sources like Islandora Bagger. I'd like to investigate if we could combine several discrete reports together into a larger dataset, assuming we could glue them together with some data like node IDs/UUIDs, etc.
Islandora Riprap provides site admins with fixity data, including a chart showing failed fixity events. It would be interesting to see if we could pull in data from Riprap into a report managed by Repository Reports.
@Natkeeran it occured to me that we could create a set of Views field plugins that generate some of the data you are describing, which could then be used in standard Views. Islandora Riprap already uses this technique. That module provides a "Fixity Auditing" View field plugin that queries Riprap for fixity events on each media that show up in Islandora's "Manage Media" view. When the "field" the plugin defines is added to that View, it renders the data like this (it's the right-most field in this example):
I think this is worth exploring, since it builds on Views and provides an easy way for developers to add new types of data later.
@mjordan Thank you. Sorry for this delayed response.
Yes, you are right. It won't be a single source. The plugin approach works well.
In esses, our major use cases, really know how many actual/conceptual objects we have, how much storage it is taking, preservation related information such as file formats, checksum, AIP generated or not as noted above.
I see that you have created a module. Will evaluate and provide feedback. Thank you.
@mjordan
Here is an initial list of fields/information that would be useful to have in preservation reports: https://docs.google.com/spreadsheets/d/1sFYqzeXZnEDNCnrPGgQR9A73a9wparhhtVnCriWVi1U/edit?usp=sharing
We are currently doing a review for 7.x stack preservation reports/features, and can provide a more concreate list that would be of local interest in a week or two. Thank you.