Closed Digital-Grinnell closed 3 years ago
@Digital-Grinnell As a generic (non-Github specific) solution, I wonder if just having Workbench strip everything including and after a ?
in the filenames of downloaded files would be sufficient? Using your example, if the remote filename is Video_03.mp4?raw=true
, the version that Workbench saves would be Video_03.mp4
.
Yes, that was what I was thinking too. Asked on my experience I think it would work.
Sent from my iPad
On May 3, 2021, at 8:43 PM, Mark Jordan @.***> wrote:
@Digital-Grinnellhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Digital-2DGrinnell&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=D8E-oGNaPT9srWV6jE8UP5unsmKEmmHEH-tzgmjBvLk&m=QWgax0m8rYS-6htgq3zGc_hpwcGu_phq1MA2SWPQb2Q&s=bCjfl8CO0Tr_9qnh9kQ6EIGt8O6YjF2W9LvRmUvmJzE&e= As a generic (non-Github specific) solution, I wonder if just having Workbench strip everything including and after a ? in the filenames of downloaded files would be sufficient? Using your example, if the remote filename is Video_03.mp4?raw=true, the version that Workbench saves would be Video_03.mp4.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mjordan_islandora-5Fworkbench_issues_260-23issuecomment-2D831637728&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=D8E-oGNaPT9srWV6jE8UP5unsmKEmmHEH-tzgmjBvLk&m=QWgax0m8rYS-6htgq3zGc_hpwcGu_phq1MA2SWPQb2Q&s=CJiKq2xWTPCk1KoXwATWIUV5tl7jm7hymMMfLg7npC0&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADTEQ6CQBNUYF5GZXQSRBMTTL5GK3ANCNFSM432NCJXQ&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=D8E-oGNaPT9srWV6jE8UP5unsmKEmmHEH-tzgmjBvLk&m=QWgax0m8rYS-6htgq3zGc_hpwcGu_phq1MA2SWPQb2Q&s=XLHSawKKVR8_lFLyqRdqSQ7MDdmX4etifUuxBL5qYxw&e=.
@Digital-Grinnell can you test the issue-260 branch to see if it resolves the problem?
Edit: hold off, I broke something else.
@Digital-Grinnell OK, the issue-260 branch is ready to test, if you can.
I pulled the issue-260
branch and subsequently ran a brief test last evening, an ingest of video content that featured three different file
references in the following forms:
All three ingested successfully this time! The output from my test run follows.
╭─markmcfate@MAC02NX13MG5RP ~/GitHub/islandora_workbench ‹ruby-2.3.0› ‹issue-260*›
╰─$ ./workbench --config icg_testing-video.yml
OK, connection to Drupal at https://icg-islandora.williams.edu verified.
Warning: Media creation in your version of Drupal (8.9.14) is less reliable than in Drupal 9.2 or higher.
Node for "A Brief History of Acceleration" (record 20116) created at https://icg-islandora.williams.edu/node/69.
+ Media for https://github.com/Islandora-Collaboration-Group/islandora-sample-objects/raw/master/VIDEO/Video_01/Video_01.mp4 created.
Node for "Blues for C.M." (record 20117) created at https://icg-islandora.williams.edu/node/70.
+ Media for https://github.com/Islandora-Collaboration-Group/islandora-sample-objects/blob/master/VIDEO/Video_02/Video_02.mp4?raw=true created.
Node for "Olivia's Arrival" (record 20118) created at https://icg-islandora.williams.edu/node/71.
+ Media for Video_03.mp4 created.
If I have time this afternoon I'll run a similar test using PDFs. Looking good.
Just did a PDF ingest test using issue-260
branch of Workbench and had mixed results. It looks like the file
handling works properly as I tested this time with references like this:
All three objects were created but I got NO media again. However, this time the media errors in the logs are of the form:
11-May-21 09:06:43 - INFO - Node for A search for antigens common to fetal and tumor cells (record 21001) created at https://icg-islandora.williams.edu/node/76.
11-May-21 09:06:44 - ERROR - Media not created, PUT request to "https://icg-islandora.williams.edu/node/76/media/document/16" returned an HTTP status code of "404".
The two URL file
references left behind viable intermediate directories and PDF documents, so that's a good sign. Another member of our ICG testing team is bringing this to DKC's attention now.
Unlike earlier tests that returned 404 errors, this ingest was performed using an admin
account that had sufficient privileges to successfully create media for other content types.
Can you run curl -v -uadmin:islandora "https://icg-islandora.williams.edu/islandora_workbench_integration/core_version"
replacing the credentials with the same ones used in your config file and let me know what comes back? Should be something like {"core_version":"9.3.0-dev"}
.
The 404 is being generated by the Islandora media REST endpoint, but that endpoint has to exist, otherwise you'd see media not being created in general. Not sure what's going on yet.
Here are the results...
╭─markmcfate@MAC02NX13MG5RP ~/GitHub/islandora_workbench ‹ruby-2.3.0› ‹issue-260*› ╰─$ curl -v -uadmin:xxxxxxxxxxxxx "https://icg-islandora.williams.edu/islandora_workbench_integration/core_version"
GET /islandora_workbench_integration/core_version HTTP/2 Host: icg-islandora.williams.edu Authorization: Basic YWRtaW46QnV0dGVyU2FnZUdub2NjaGk= User-Agent: curl/7.64.1 Accept: /
From: Mark Jordan @.>
Sent: Tuesday, May 11, 2021 10:06 AM
To: mjordan/islandora_workbench @.>
Cc: Digital @.>; Mention @.>
Subject: Re: [mjordan/islandora_workbench] Media referenced by URL with ?raw=true
suffix not loading. (#260)
Can you run curl -v -uadmin:islandora "https://icg-islandora.williams.edu/islandora_workbench_integration/core_versionhttps://urldefense.proofpoint.com/v2/url?u=https-3A__icg-2Dislandora.williams.edu_islandora-5Fworkbench-5Fintegration_core-5Fversion&d=DwQCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=D8E-oGNaPT9srWV6jE8UP5unsmKEmmHEH-tzgmjBvLk&m=pSPypnl4PUEELu_KWmE4yiKgkkmOfnzIcsuipDUjk1I&s=ARlvm35S-LseqIMWKDp66qWQEbU63KWBaxS6YQkX7Qo&e=" replacing the credentials with the same ones used in your config file and let me know what comes back? Should be something like {"core_version":"9.3.0-dev"}.
The 404 is being generated by the Islandora media REST endpoint, but that endpoint has to exist, otherwise you'd see media not being created in general. Not sure what's going on yet.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mjordan_islandora-5Fworkbench_issues_260-23issuecomment-2D838643345&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=D8E-oGNaPT9srWV6jE8UP5unsmKEmmHEH-tzgmjBvLk&m=pSPypnl4PUEELu_KWmE4yiKgkkmOfnzIcsuipDUjk1I&s=-uG3SCKo-j6t2PKmwcH8ZAH-MkzpXHunoFjEoApWp6k&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADTEQ6GZAAEMA67NCOKTB2LTNFBWJANCNFSM432NCJXQ&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=D8E-oGNaPT9srWV6jE8UP5unsmKEmmHEH-tzgmjBvLk&m=pSPypnl4PUEELu_KWmE4yiKgkkmOfnzIcsuipDUjk1I&s=DC_L98OVSMUEIutHqtFzNzVemMjweIWYntvUDE4nDP8&e=.
OK, thanks. Can you pull in the latest updates to Islandora Workbench from the main branch and try to ingest those PDFs again?
@McFateM , can you confirm that you have a "Document" media type configured (/admin/structure/media/manage/document
)?
If that's the issue here, we can make Workbench confirm that the media type exists during its --check
phase.
@Digital-Grinnell OK to close this issue, since you've been able to ingest files whose URLs have a ?
query string? I've opened #269 to address verifying media types exist.
Yes, by all means. Thanks!
Mark A. McFate Digital Library Applications Developer Burling Library, Grinnell College 1111 6th Ave., Grinnell, IA 50112-1690 (641) 269-3674 @.***
From: Mark Jordan @.>
Sent: Tuesday, May 11, 2021 9:27 PM
To: mjordan/islandora_workbench @.>
Cc: McFate, Mark @.>; Mention @.>
Subject: Re: [mjordan/islandora_workbench] Media referenced by URL with ?raw=true
suffix not loading. (#260)
@Digital-Grinnellhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Digital-2DGrinnell&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=_Ys0Uw6PBiuBCsuMrw74tM84n-6WTLjQvYEHSxP9q1I&s=9F9uzmLH35gIKhYWJQEuL0Sv1fW2NrngfCp6CrYyU8E&e= OK to close this issue, since you've been able to ingest files whose URLs have a ? query string? I've opened #269https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mjordan_islandora-5Fworkbench_issues_269&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=_Ys0Uw6PBiuBCsuMrw74tM84n-6WTLjQvYEHSxP9q1I&s=QlBMj5QmPZ62z_z7i9FSu-2pfF1yTSYbPWPKeFpRjuk&e= to address verifying media types exist.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mjordan_islandora-5Fworkbench_issues_260-23issuecomment-2D839385353&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=_Ys0Uw6PBiuBCsuMrw74tM84n-6WTLjQvYEHSxP9q1I&s=x0bBDPT1FZSxbqXin6AlByRxevDaYHaH0wDvCjXJ8jY&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACAURQOJPHTYHFP6POGIMQLTNHRPZANCNFSM432NCJXQ&d=DwMCaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=_Ys0Uw6PBiuBCsuMrw74tM84n-6WTLjQvYEHSxP9q1I&s=PQRI53VeEgnnNvLo8dqQ707HF8KQahKJkkBpe1wba7o&e=.
Working on ICG's test of I8 and we have a Google Sheet with media/file references like this:
This was for an object to which I assigned a unique ID of 1100118. The “intermediate” file left behind in my input_data folder was correspondingly named
1100118/Video_03.mp4?raw=true
. That intermediate file was a viable .mp4 video because I was able to play it locally once I removed the?raw=true
suffix from the filename. Unfortunately, Workbench was subsequently unable to upload the viable "intermediate" file, presumably because the filename still had the?raw=true
suffix.So, I changed the entry in the Google Sheet to read
https://github.com/Islandora-Collaboration-Group/islandora-sample-objects/blob/master/VIDEO/Video_03/Video_03.mp4
thinking that would solve the problem. It did not. That URL returned an HTML response and I subsequently got the following in my log file: