mkernik / drum_tools

This repository contains a collection of python scripts and tools to help with efficiently processing submissions to the Data Repository at the University of Minnesota (DRUM)
4 stars 1 forks source link

Skip embargoed files in the Download All tool #21

Open mkernik opened 2 weeks ago

mkernik commented 2 weeks ago

There is a way to check if all files are embargoed (accessStatus --> "embargo") which has been added to the tool. But if even one file is not embargoed, the status will be "open.access." I have not found a marker in the API associated with an individual file (at least in the public view), that allows us to identify whether it is embargoed. The tool will try to download files that it can't and actually does download objects with the correct name / extensions. When you open the files, though the content isn't really there. We need to find a way to skip these files. Possible ideas include:

Example where some files are embargoed: https://conservancy.umn.edu/server/api/core/items/bf926686-c2d1-456b-a87b-a4f91fd9df68 Example where all files are embargoed: https://conservancy.umn.edu/server/api/core/items/d806692f-1f9c-4208-82a9-97393d9f0c88

mkernik commented 2 weeks ago

update to my understanding based on the documentation: By default: open.access = Item's primary file is downloadable to anonymous users embargo = Item's primary file is under an embargo

mkernik commented 2 weeks ago

Suboptimal idea: We could open the url for requesting a copy and see whether the words "You already have access to this file" appear on the page.... [e.g. https://conservancy.umn.edu/items/bf926686-c2d1-456b-a87b-a4f91fd9df68/request-a-copy?bitstream=4add22bf-eb5a-4367-87e5-1ffc4268a6e5]

mkernik commented 2 weeks ago

Also might need to rearrange the way embargo is currently implemented since it is enacted with the primary file, not every file (e.g. https://conservancy.umn.edu/items/f2eba152-86e3-4b46-b50d-977f6c565bb8). My current adjustment was to raise an error and skip all files if status is "embargo." At least need to change the message to warn to check and deal with files manually, but not to state that "all files connected to this deposit are under embargo"