Closed davidschober closed 7 years ago
@bmquinn @kdid @csyversen or @carrickr can you give me a rough estimate as to work involved. We discussed this during a meeting a while ago, but it never got fleshed out.
Note this was originally discussed in #351
Just to be clear: Posters from Herskovits and WW2 Posters are for DPLA.@jenyoung is the contact for DPLA. Karen Miller and Geoff Morse would like to add the public collections from Images to Primo, so if all of the others ones can go to them, that would be great. If we have to limit it to a few, @Nic4Images would be the person to select which public collections (VRA XML records) should be extracted. Karen would also like some help on how to create the thumbnail URL in IIIIF speak. @MetadataKaren
If I can add my 2 cents here, "a dead simple view listing pids linking to VRA records" is not sufficient to our needs. We have XSL files, we need XML files.
We will be very happy to have a file for every public image in the repository (although we do NOT need those for the Africana or Afri posters). There is no need to limit the number of files on our behalf.
Also, when we made this request for Winterton MODS records, Mike Stroming put together a Rails job that he believed could be used to do future record exports of the same ilk. My understanding is that Michael North has this utility and/or knows where to find it. This seems to me like it might be of use here.
@MetadataKaren my hope was a listing of links to the xml files would allow someone on your team to programmatically get the data in a more flexible. If that doesn't work, we will pursue the other route.
Can you list the public collections you need records for?
@davidschober, in a nutshell, all of the public collections that are listed on the home page for the Images Repo, with the exception of the WWII and Africana posters. In detail:
Hamid Naficy Iranian Movie Posters Collection Ramon Casas sketchbooks University Archives Postcards Alexander Hesler Photograph Collection Vernon McKay Photographs Wilmo Engine Company dealer’s sample catalog Rob Linrothe Image Collection Une cite moderne dessins by Rob Mallet-Stevens Justine Cordwell Collection Department of Art History Study Photographs Collection Jim Roberts Photographs 1968-1972 Kate and Lou. Souvenir of auto trip to San Francisco 1915 George Silas Duntley 1879-1957 Photographs circa 1899-1918 Ira Silverman Railroad Menu Collection Depictions of Africa in French humour magazines WPA Digital Collection 1935-1943 Photographs of Zanzibar Ifeoma Onyefulu photographs United States Army Base Hospital Number 12 World War I and II Records The Siege and Commune of Paris 1870-1871 La Caricature 1830-1835 Charles Philipon founding editor La Caricature 1880-1893 1899 Albert Robida founding editor
@bmquinn @kdid @csyversen or @carrickr hold off until this gets discussed more. See note below.
@MetadataKaren and team, an end of the day discussion -- What if, for now, we create handles for the public collection URLs and those are used in Primo. They would go in as collections vs. item-level.
Benefits:
Other thoughts:
Let's meet to discuss!
I suppose we could do that; I’ll ask Geoff if that’s something he had in mind.
However, it is a lot of work for me, as I’ll have to create a PNX record for each collection. Using the individual images, on the other hand, makes use of existing metadata and is minimal work for me. Also, although the finding aid records do represent a collection, the entire finding aid text is stuffed into a search field in the PNX, so the finding aid is full text searchable in Primo. You can search for a name that appears once deep down inside a finding aid and Primo will return the finding aid in search results. This, of course, wouldn’t be possible for an image collection record.
Finally, I’m not too worried about the URLs changing for these. We can delete them and re-import new records for them when they’re in the new repository. This is something we’re going to have to do for finding aids as well.
Karen
I just checked with Geoff Morse about these, since the original proposal to ingest public image records was his. His opinion is that the collection level records will not be very useful for the user. He and I strongly prefer individual records.
He also said he's fairly certain that the additional 15k records will make no difference to Primo's cost. They are, in his words, "a drop in the bucket" compared to the 2 million plus Hathi records we added.
I saw Jen's email after Carolyn's, so here are some comments on that:
I am not particularly concerned with synchronizing new items or edited or deleted items -- there is already a mechanism for this on the Primo end. Yes, it does mean more exports from RDC, so that's an issue. But this is something we do for finding aids now, and I would be very happy to help develop a workflow for determining which records are new or updated (I did this using ITQL when finding aids were in Fedora).
Using collection level records will not make adding new collections to Primo any quicker than adding individual records -- it would make it more difficult to add new collections to Primo, because I would need to create a new PNX record for each new collection.
Ok, @MetadataKaren, sounds like you covered all our concerns and since you can easily delete these when we migrate, then let's go ahead and get what is needed. @jenyoung won't have any time until May if you need crosswalk input and the developers need to fit this in with other priorities since we had planned on this back in Feb and the window was missed so is a month a decent deadline @csyversen @bmquinn @kdid? going forward with nextgen, we will need to figure out how to automate this with resource sync so @davidschober will add that to the backlog and be in touch.
Hi @MetadataKaren, here's a box folder with a zip file of the ~15,000 public image VRA records. Stay tuned next week for information about a (mostly) automated way to grab these in the future that @kdid and I are working on.
Brendan,
Thank you very much! That is terrific news.
I'm home with a nasty virus today, so I'll have a look at these when I get back. Geoff and I are excited to be moving forward with this.
Also, any guidance you can give us on how to use the filename or PID or something in the file to build a URL of a thumbnail for display in search results would be much appreciated.
Karen
Karen D. Miller Monographic Cataloger/Metadata Specialist Northwestern University Libraries Northwestern University 1970 Campus Drive Evanston, IL 60208 www.library.northwestern.edu k-miller3@northwestern.edu 874.467.3462
From: Brendan Quinn notifications@github.com Sent: Friday, April 14, 2017 9:18 PM To: nulib/images Cc: Karen Miller; Mention Subject: Re: [nulib/images] Get VRA xml from public collections (#381)
Hi @MetadataKarenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MetadataKaren&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=370PvvyJc138KZQ1GvXmF2JMyRL116CF8M0uXgdusaU&e=, here's a box folderhttps://urldefense.proofpoint.com/v2/url?u=https-3A__northwestern.box.com_s_d0uejrtlp2r5jlgob883gbr5bdwm8i8j&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=mHyv7jcY7kh4BSUfxwpk_aL4E9y6o6ubjjEI5CPTTOo&e= with a zip file of the ~15,000 public image VRA records. Stay tuned next week for information about a (mostly) automated way to grab these in the future that @kdidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kdid&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=MR7trkmOdZNmXKoxaizjcxTTwYDFonNOwdyWOhruoo4&e= and I are working on.
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nulib_images_issues_381-23issuecomment-2D294266273&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=6otrM_izw4oelF-1JfEdaZkI0hVWDUCWpcM0lhWs9LU&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGas3oBT75tB2e7zD-2Dnuqe20leZPFaa7ks5rwCj8gaJpZM4M8HaK&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=0yV3mIKMRtiH_OWiVYpuef7ixcJy6_scNFbaBPtMkhw&e=.
Hi Karen, IIIF is what we use to generate the thumbnails for the collections.
You would just substitute the pid of the image and change the colon to a dash.
Example:
For pid: inu:dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750
URL format for thumbnail: http://images.northwestern.edu/image-service/inu-dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750/square/,300/0/default.jpg
there are other ways to generate a thumbnail using IIIF but this seems to be the easiest. Full spec: http://iiif.io/api/image/2.1/#region
Thanks, Jen. That looks easy enough.
Karen
Sent from my iPhone
On Apr 17, 2017, at 1:49 PM, Jen Young notifications@github.com<mailto:notifications@github.com> wrote:
Hi Karen, IIIF is what we use to generate the thumbnails for the collections.
You would just substitute the pid of the image and change the colon to a dash.
Example:
For pid: inu:dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750
URL format for thumbnail: http://images.northwestern.edu/image-service/inu-dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750/square/,300/0/default.jpg
there are other ways to generate a thumbnail using IIIF but this seems to be the easiest. Full spec: http://iiif.io/api/image/2.1/#regionhttps://urldefense.proofpoint.com/v2/url?u=http-3A__iiif.io_api_image_2.1_-23region&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=ky9EXGoQ8ANeQTLA7J9rlKqLDVV7KpqW6ZdrKCStCvk&s=t6y4MprMO29fgIzpWgUw1WuKfKqGZYlQpmvQ0zjfIvk&e=
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nulib_images_issues_381-23issuecomment-2D294557734&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=ky9EXGoQ8ANeQTLA7J9rlKqLDVV7KpqW6ZdrKCStCvk&s=vSDwg0OQdx3ffdKmo22yN79YXIJX83ozgMVSay0o88E&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGas3kFFM1-2DFPpKDbWpJQW-2DlGLTnhEI8ks5rw7RIgaJpZM4M8HaK&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=ky9EXGoQ8ANeQTLA7J9rlKqLDVV7KpqW6ZdrKCStCvk&s=njHk-lPUVQE_5OxACt2Kdom6I5-wow-qh85q-t3eqq8&e=.
@MetadataKaren - We have performance issues with the image server on the Images application (until we move to NextGen). Would it be possible for you to use the same call that we already use for the thumbs, since these are likely already cached by the application? Not sure of your height/width requirements. If not, we may need to talk further.
This is the call (example for each horizontal and vertical)
Yes, we can use those - that is no problem at al.
Sent from my iPhone
On Apr 18, 2017, at 9:44 AM, kdid notifications@github.com<mailto:notifications@github.com> wrote:
@MetadataKarenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MetadataKaren&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=J_Ihl-BHr7Vws0cjGI11y6edo2GYgdZUVeqvcxryy_0&s=uElZ7f0atw1xSv5mHPiXLJDRqGbsdzSb1_fOrCX1FZQ&e= - We have performance issues with the image server on the Images application (until we move to NextGen). Would it be possible for you to use the same call that we already use for the thumbs, since these are likely already cached by the application? Not sure of your height/width requirements. If not, we may need to talk further.
This is the call (example for each horizontal and vertical)
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nulib_images_issues_381-23issuecomment-2D294867619&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=J_Ihl-BHr7Vws0cjGI11y6edo2GYgdZUVeqvcxryy_0&s=W5k1nhaXKUJNwD3MoweM4z15StAyaDQeG2_hg8Fk6Hc&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGas3mVeiaeVFWcNTxRDdM4hdAlKuV-2Dsks5rxMxlgaJpZM4M8HaK&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=J_Ihl-BHr7Vws0cjGI11y6edo2GYgdZUVeqvcxryy_0&s=igCc3c9_oiv223x3DWQm-ylJ79F8VPsTDm4Rd4hGY1E&e=.
@MetadataKaren it looks like you got what you need. Is it OK if I close this?
@davidschober, yes, this can be closed. I am curious, though, about the "a (mostly) automated way to grab these in the future " that @bmquinn and @kdid are working on.
@MetadataKaren - coming soon. We've not yet pushed that to production. We will update you soon.
Excellent! I'm looking forward to it. Thank you for the update!
Oops, I did not mean to close this! Sorry about that.
We will go over at the meeting today.
https://github.com/nulib/repodev_planning_and_docs/wiki/Public-Collections-VRA-and-Thumbnail-Access
Closing, as the meeting happened and all seem OK.
Descriptive summary
The metadata group / DPLA wants to put public collections into primo. We don't have an OAI/PMH handler, so we've been asked to dump the VRA xml and give notes on how to construct a thumbnail (using the iiif endpoint).
Done looks like
The original estimate in #351 was three days. Is that still accurate (24 dev hours)?