Open pgwillia opened 2 years ago
blob_id(checked 222749),object_uuid(empty if orphan),file_name,file path,item path
4f1ac054-21e4-463f-94fb-86b56d3b5d52,34a05341-65e6-4732-a341-134559e8475e,8710531-NL22923.pdf,https://era.test.library.ualberta.ca/items/34a05341-65e6-4732-a341-134559e8475e/view/975d9c55-2b63-4730-a096-075568efb797/8710531-NL22923.pdf,https://era.test.library.ualberta.ca/items/34a05341-65e6-4732-a341-134559e8475e
80b30020-6786-4450-a070-b1404a860f8a,96b4e2de-e0da-4225-826a-550d310da6d1,WorkflowEngine.pdf,https://era.test.library.ualberta.ca/items/96b4e2de-e0da-4225-826a-550d310da6d1/view/e8996143-f1de-4255-aa69-9f793eb4717d/WorkflowEngine.pdf,https://era.test.library.ualberta.ca/items/96b4e2de-e0da-4225-826a-550d310da6d1
d436634d-dd40-49b9-ad2b-4439d1e06487,9dd00e26-85a9-4baa-8a46-59f82a821b18,Monica%20Fraser.pdf,https://era.test.library.ualberta.ca/items/9dd00e26-85a9-4baa-8a46-59f82a821b18/view/e01d30e1-adaa-48da-ac7b-32bc70f5047c/Monica-20Fraser.pdf,https://era.test.library.ualberta.ca/items/9dd00e26-85a9-4baa-8a46-59f82a821b18
e1aee2ac-69ca-4fcd-bbe3-17f3cccb55ab,66cf48a0-f6a7-4429-9349-7507a5475953,MM64875-MM94917.pdf,https://era.test.library.ualberta.ca/items/66cf48a0-f6a7-4429-9349-7507a5475953/view/3d484bad-4a13-43c8-9605-8acbe2783c84/MM64875-MM94917.pdf,https://era.test.library.ualberta.ca/items/66cf48a0-f6a7-4429-9349-7507a5475953
28a60087-6956-4bee-b247-b804a405c251,,Zhenhua_Li-PhD_thesis_-_submission.pdf
78c401bb-004e-4433-b544-dcb0e2a276c1,,Monica%20Fraser.pdf
c9b4bfe5-ccb9-4674-932c-8a420d1895bf,,scan.pdf
ecd2b144-d74b-46d3-8429-91c6b82b40bf,,WorkflowEngine.pdf
These are the problematic files I have currently found (their mime type (blob's content_type) states 'application/pdf' yet the associated file does not have an extension of '.pdf'. The second half of these are orphans and would be cleaned up using the garbage collect orphan blobs rake task. I will be loosening up the parameters to hopefully catch some more by cross checking all of the blobs content_types to the expected extension through a look-up table.
These items might
We are looking to identify any items or theses that may have this issue and create a report with at least the id, title and url.
If we can figure out a pattern to match this might be a good second round question for https://docs.google.com/document/d/1kjBhKqekIuH4VD_FFz1B668gumrJiHluoMh8fEh4Ktc/edit#heading=h.2fir5v5sus5t
Originally posted by @pgwillia in https://github.com/ualbertalib/digital-preservation/issues/45#issuecomment-1249716960
Related: https://github.com/ualbertalib/digital-preservation/issues/45 and https://github.com/ualbertalib/jupiter/issues/2043