Closed justinlittman closed 8 months ago
While you can change the metadata for a file in H2 (e.g. label or visibility) it doesn't seem like H2 currently has the notion of a "changed file" where the file contents of an existing file have changed. To "change" a file, you delete the existing file and then upload a new one with the same name. This creates a new AttachedFile
. The form to edit a work does not allow a file with the same name as an existing one to be uploaded.
If you're doing the file replacement via Globus, all existing AttachedFiles
are first deleted.
With this current approach, it would be possible to tell whether there are "changed" files the same way you would check for new files, by checking the service_name
of the WorkVersion
's AttachedFile
blobs and looking for those that do not have a service_name
of preservation
. The existing WorkVersion.staged_files
method does this using ActiveStorage::Service::SdrService.accessible?(af.file.blob)
.
To detect whether a file was "changed" by being deleted, you have to compare the current WorkVersion
's attached files with the previous WorkVersion
's.
@lwrubel What about additions to the list of files? Or whether files have been hidden or unhidden?
If a file has been added, then there would be a new AttachedFile
. Replaced files currently look like new files.
If it hasn't already been discussed, I think we should talk about hiding/unhiding as part of the User Version requirements--would changing those file metadata fields mean a user version should be automatically created? While that's a metadata change, I'm guessing the user would expect that to be a new version.
@lwrubel When @andrewjbtw and I discussed changes to the files, I'm not sure we actually covered hiding/unhiding. Or changes to the file descriptions. To me hiding/unhiding seems a significant enough change to warrant the new version . I could go either way on descriptions, but since it's in the file upload section of the UI, it might be easier for users to understand as all part of "file changes."
I agree that hiding/unhiding makes a new user version because it does change what the citeable data contains.
Some thoughts on how to determine if files have changed on an H2 item and therefore a new User Version should be created:
If someone does a Globus or Zipfile upload, it should always create a new User Version.
Files have been added: check the service_name
of the WorkVersion's AttachedFile blobs and see if any do NOT have a service_name
of preservation
. The existing WorkVersion.staged_files
method finds preserved files using ActiveStorage::Service::SdrService.accessible?(af.file.blob)
, so we want to reject those. This is the same for files that have been replaced, since that involves deleting the file and and uploading a new version.
Files have been removed: Compare the WorkVersion's AttachedFile
filenames with the previous H2 version's AttachedFile
filenames. If there is a filename in the previous version that does not match a filename on the current work version, we could assume there was a deletion. The filename
includes the path for hierarchical zip and globus deposits. Multiple copies of a file could be in different directories, so we need to use that instead of checksums.
One or more files have been hidden or unhidden. Compare the hide
field for each AttachedFile
with the previous H2 version's AttachedFile
hide field, matching on filename. Some deposits have hundreds of files, so possibly this comparison could be done until a change is found (meaning we need a new User Version) and then stop. If any files have been added or removed, we would not need to check for this since the deposit would already meet the criteria for being a new User Version.
A user version requirement is that when a deposit is performed, if any files have changed since the previous deposited version, a new user version will automatically be created. (If no files have changed the user will be given the option of creating a new user version.)
Please research what approach might be used to determine if any files have changed since the previous deposited version.