Closed peetucket closed 2 years ago
@peetucket One possible implementation is that the unzipping happens entirely within H2; thus H2 is the only system that needs to know if a zip should be extracted. Is there any other accessioning application that needs zip functionality?
Yes, that is one option. We can discuss as a team.
Basically I think the two options are:
Andrew has a preference for option 2, which is how I am currently writing up the tickets, though there will be pros and cons of each. I will modify tickets as the POs and engineers provide feedback.
How will the derivatives for the files in the zip get created so that it can be used by the access systems? How do we deal with 2 zips that both have files with the same names inside of them?
Not sure how internals would be accessed, perhaps this is not feasible without doing something like expanding the ZIP on the access systems. Could be an area of investigation to see if option #2 is even really an option. But the idea is that only a single ZIP would be allowed to prevent name clashes.
We already have objects that have more than one zip file in the repository: https://argo.stanford.edu/view/druid:cw226nt8831
Option 2 is significantly more complex and likely to have unintended / unexpected consequences. Are there use cases to support the preferred implementation?
We already have objects that have more than one zip file in the repository: https://argo.stanford.edu/view/druid:cw226nt8831
Understood - but in the proposed implementation, this would not be allowed for specific objects (validated at the H2 level).
Option 2 is significantly more complex and likely to have unintended / unexpected consequences. Are there use cases to support the preferred implementation?
We will need more input from @andrewjbtw (and @amyehodge ). Part of the concern in asking for this apporach may have been with what happens if we have a ZIP that has many thousands of files and/or large content causing problems in accessioning, but this is not a user requirement but rather something that we could work out in the implementation.
Maybe I misunderstood the conversation, but I thought allowing more than one zip was a reason to use zip over direct folder upload, where uploading multiple folders could get messy. We were going to prohibit uploading more than one folder.
We've passed the threshold for a ticket conversation; sounds like this needs a meeting.
Closing as no longer needed - changing approach to how ZIPs are accessioned via H2.
In sul-dlss/happy-heron#2843, we will be adding an option to allow users to upload a single ZIP file for expansion on the server providing users access to individual files. Since we need to know this option was selected in systems outside of H2 in order to drive display behavior in Argo and sul-embed, we may need to store this in the structural or other part of the cocina model.
The value will be at an the object level.
Engineer questions: