nulib / images

Hydra based image application
https://images.northwestern.edu/
3 stars 1 forks source link

Get VRA xml from public collections #381

Closed davidschober closed 7 years ago

davidschober commented 7 years ago

Descriptive summary

The metadata group / DPLA wants to put public collections into primo. We don't have an OAI/PMH handler, so we've been asked to dump the VRA xml and give notes on how to construct a thumbnail (using the iiif endpoint).

Done looks like

The original estimate in #351 was three days. Is that still accurate (24 dev hours)?

davidschober commented 7 years ago

@bmquinn @kdid @csyversen or @carrickr can you give me a rough estimate as to work involved. We discussed this during a meeting a while ago, but it never got fleshed out.

davidschober commented 7 years ago

Note this was originally discussed in #351

ccaizzi commented 7 years ago

Just to be clear: Posters from Herskovits and WW2 Posters are for DPLA.@jenyoung is the contact for DPLA. Karen Miller and Geoff Morse would like to add the public collections from Images to Primo, so if all of the others ones can go to them, that would be great. If we have to limit it to a few, @Nic4Images would be the person to select which public collections (VRA XML records) should be extracted. Karen would also like some help on how to create the thumbnail URL in IIIIF speak. @MetadataKaren

MetadataKaren commented 7 years ago

If I can add my 2 cents here, "a dead simple view listing pids linking to VRA records" is not sufficient to our needs. We have XSL files, we need XML files.

We will be very happy to have a file for every public image in the repository (although we do NOT need those for the Africana or Afri posters). There is no need to limit the number of files on our behalf.

Also, when we made this request for Winterton MODS records, Mike Stroming put together a Rails job that he believed could be used to do future record exports of the same ilk. My understanding is that Michael North has this utility and/or knows where to find it. This seems to me like it might be of use here.

davidschober commented 7 years ago

@MetadataKaren my hope was a listing of links to the xml files would allow someone on your team to programmatically get the data in a more flexible. If that doesn't work, we will pursue the other route.

Can you list the public collections you need records for?

MetadataKaren commented 7 years ago

@davidschober, in a nutshell, all of the public collections that are listed on the home page for the Images Repo, with the exception of the WWII and Africana posters. In detail:

Hamid Naficy Iranian Movie Posters Collection Ramon Casas sketchbooks University Archives Postcards Alexander Hesler Photograph Collection Vernon McKay Photographs Wilmo Engine Company dealer’s sample catalog Rob Linrothe Image Collection Une cite moderne dessins by Rob Mallet-Stevens Justine Cordwell Collection Department of Art History Study Photographs Collection Jim Roberts Photographs 1968-1972 Kate and Lou. Souvenir of auto trip to San Francisco 1915 George Silas Duntley 1879-1957 Photographs circa 1899-1918 Ira Silverman Railroad Menu Collection Depictions of Africa in French humour magazines WPA Digital Collection 1935-1943 Photographs of Zanzibar Ifeoma Onyefulu photographs United States Army Base Hospital Number 12 World War I and II Records The Siege and Commune of Paris 1870-1871 La Caricature 1830-1835 Charles Philipon founding editor La Caricature 1880-1893 1899 Albert Robida founding editor

ccaizzi commented 7 years ago

@bmquinn @kdid @csyversen or @carrickr hold off until this gets discussed more. See note below.

jenyoung commented 7 years ago

@MetadataKaren and team, an end of the day discussion -- What if, for now, we create handles for the public collection URLs and those are used in Primo. They would go in as collections vs. item-level.

Benefits:

Other thoughts:

Let's meet to discuss!

MetadataKaren commented 7 years ago

I suppose we could do that; I’ll ask Geoff if that’s something he had in mind.

However, it is a lot of work for me, as I’ll have to create a PNX record for each collection. Using the individual images, on the other hand, makes use of existing metadata and is minimal work for me. Also, although the finding aid records do represent a collection, the entire finding aid text is stuffed into a search field in the PNX, so the finding aid is full text searchable in Primo. You can search for a name that appears once deep down inside a finding aid and Primo will return the finding aid in search results. This, of course, wouldn’t be possible for an image collection record.

Finally, I’m not too worried about the URLs changing for these. We can delete them and re-import new records for them when they’re in the new repository. This is something we’re going to have to do for finding aids as well.

Karen

MetadataKaren commented 7 years ago

I just checked with Geoff Morse about these, since the original proposal to ingest public image records was his. His opinion is that the collection level records will not be very useful for the user. He and I strongly prefer individual records.

He also said he's fairly certain that the additional 15k records will make no difference to Primo's cost. They are, in his words, "a drop in the bucket" compared to the 2 million plus Hathi records we added.

I saw Jen's email after Carolyn's, so here are some comments on that:

I am not particularly concerned with synchronizing new items or edited or deleted items -- there is already a mechanism for this on the Primo end. Yes, it does mean more exports from RDC, so that's an issue. But this is something we do for finding aids now, and I would be very happy to help develop a workflow for determining which records are new or updated (I did this using ITQL when finding aids were in Fedora).

Using collection level records will not make adding new collections to Primo any quicker than adding individual records -- it would make it more difficult to add new collections to Primo, because I would need to create a new PNX record for each new collection.

ccaizzi commented 7 years ago

Ok, @MetadataKaren, sounds like you covered all our concerns and since you can easily delete these when we migrate, then let's go ahead and get what is needed. @jenyoung won't have any time until May if you need crosswalk input and the developers need to fit this in with other priorities since we had planned on this back in Feb and the window was missed so is a month a decent deadline @csyversen @bmquinn @kdid? going forward with nextgen, we will need to figure out how to automate this with resource sync so @davidschober will add that to the backlog and be in touch.

bmquinn commented 7 years ago

Hi @MetadataKaren, here's a box folder with a zip file of the ~15,000 public image VRA records. Stay tuned next week for information about a (mostly) automated way to grab these in the future that @kdid and I are working on.

MetadataKaren commented 7 years ago

Brendan,

Thank you very much! That is terrific news.

I'm home with a nasty virus today, so I'll have a look at these when I get back. Geoff and I are excited to be moving forward with this.

Also, any guidance you can give us on how to use the filename or PID or something in the file to build a URL of a thumbnail for display in search results would be much appreciated.

Karen

Karen D. Miller Monographic Cataloger/Metadata Specialist Northwestern University Libraries Northwestern University 1970 Campus Drive Evanston, IL 60208 www.library.northwestern.edu k-miller3@northwestern.edu 874.467.3462


From: Brendan Quinn notifications@github.com Sent: Friday, April 14, 2017 9:18 PM To: nulib/images Cc: Karen Miller; Mention Subject: Re: [nulib/images] Get VRA xml from public collections (#381)

Hi @MetadataKarenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MetadataKaren&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=370PvvyJc138KZQ1GvXmF2JMyRL116CF8M0uXgdusaU&e=, here's a box folderhttps://urldefense.proofpoint.com/v2/url?u=https-3A__northwestern.box.com_s_d0uejrtlp2r5jlgob883gbr5bdwm8i8j&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=mHyv7jcY7kh4BSUfxwpk_aL4E9y6o6ubjjEI5CPTTOo&e= with a zip file of the ~15,000 public image VRA records. Stay tuned next week for information about a (mostly) automated way to grab these in the future that @kdidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kdid&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=MR7trkmOdZNmXKoxaizjcxTTwYDFonNOwdyWOhruoo4&e= and I are working on.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nulib_images_issues_381-23issuecomment-2D294266273&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=6otrM_izw4oelF-1JfEdaZkI0hVWDUCWpcM0lhWs9LU&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGas3oBT75tB2e7zD-2Dnuqe20leZPFaa7ks5rwCj8gaJpZM4M8HaK&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=z8J4hF2eZC84eTBOBXz_XDGDJC1snxffKkm2Y8flE6g&s=0yV3mIKMRtiH_OWiVYpuef7ixcJy6_scNFbaBPtMkhw&e=.

jenyoung commented 7 years ago

Hi Karen, IIIF is what we use to generate the thumbnails for the collections.

You would just substitute the pid of the image and change the colon to a dash.

Example:

For pid: inu:dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750

URL format for thumbnail: http://images.northwestern.edu/image-service/inu-dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750/square/,300/0/default.jpg

there are other ways to generate a thumbnail using IIIF but this seems to be the easiest. Full spec: http://iiif.io/api/image/2.1/#region

MetadataKaren commented 7 years ago

Thanks, Jen. That looks easy enough.

Karen

Sent from my iPhone

On Apr 17, 2017, at 1:49 PM, Jen Young notifications@github.com<mailto:notifications@github.com> wrote:

Hi Karen, IIIF is what we use to generate the thumbnails for the collections.

You would just substitute the pid of the image and change the colon to a dash.

Example:

For pid: inu:dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750

URL format for thumbnail: http://images.northwestern.edu/image-service/inu-dil-bd6ef6a0-06c6-41e2-8e9f-e0bc0827f750/square/,300/0/default.jpg

there are other ways to generate a thumbnail using IIIF but this seems to be the easiest. Full spec: http://iiif.io/api/image/2.1/#regionhttps://urldefense.proofpoint.com/v2/url?u=http-3A__iiif.io_api_image_2.1_-23region&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=ky9EXGoQ8ANeQTLA7J9rlKqLDVV7KpqW6ZdrKCStCvk&s=t6y4MprMO29fgIzpWgUw1WuKfKqGZYlQpmvQ0zjfIvk&e=

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nulib_images_issues_381-23issuecomment-2D294557734&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=ky9EXGoQ8ANeQTLA7J9rlKqLDVV7KpqW6ZdrKCStCvk&s=vSDwg0OQdx3ffdKmo22yN79YXIJX83ozgMVSay0o88E&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGas3kFFM1-2DFPpKDbWpJQW-2DlGLTnhEI8ks5rw7RIgaJpZM4M8HaK&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=ky9EXGoQ8ANeQTLA7J9rlKqLDVV7KpqW6ZdrKCStCvk&s=njHk-lPUVQE_5OxACt2Kdom6I5-wow-qh85q-t3eqq8&e=.

kdid commented 7 years ago

@MetadataKaren - We have performance issues with the image server on the Images application (until we move to NextGen). Would it be possible for you to use the same call that we already use for the thumbs, since these are likely already cached by the application? Not sure of your height/width requirements. If not, we may need to talk further.

This is the call (example for each horizontal and vertical)

http://images.northwestern.edu/image-service/inu-dil-06c13960-3a2b-4bf3-b4dc-b6c2abec6266/full/,120/0/default.jpg

http://images.northwestern.edu/image-service/inu-dil-41913a91-037f-494b-9113-06004a8a98fb/full/,120/0/default.jpg

MetadataKaren commented 7 years ago

Yes, we can use those - that is no problem at al.

Sent from my iPhone

On Apr 18, 2017, at 9:44 AM, kdid notifications@github.com<mailto:notifications@github.com> wrote:

@MetadataKarenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MetadataKaren&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=J_Ihl-BHr7Vws0cjGI11y6edo2GYgdZUVeqvcxryy_0&s=uElZ7f0atw1xSv5mHPiXLJDRqGbsdzSb1_fOrCX1FZQ&e= - We have performance issues with the image server on the Images application (until we move to NextGen). Would it be possible for you to use the same call that we already use for the thumbs, since these are likely already cached by the application? Not sure of your height/width requirements. If not, we may need to talk further.

This is the call (example for each horizontal and vertical)

http://images.northwestern.edu/image-service/inu-dil-06c13960-3a2b-4bf3-b4dc-b6c2abec6266/full/,120/0/default.jpg

http://images.northwestern.edu/image-service/inu-dil-41913a91-037f-494b-9113-06004a8a98fb/full/,120/0/default.jpg

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nulib_images_issues_381-23issuecomment-2D294867619&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=J_Ihl-BHr7Vws0cjGI11y6edo2GYgdZUVeqvcxryy_0&s=W5k1nhaXKUJNwD3MoweM4z15StAyaDQeG2_hg8Fk6Hc&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGas3mVeiaeVFWcNTxRDdM4hdAlKuV-2Dsks5rxMxlgaJpZM4M8HaK&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=P54V6LrXJP5Ddzc8ZItBdqB1Kr3elvFIQ04P7n0UCbQ&m=J_Ihl-BHr7Vws0cjGI11y6edo2GYgdZUVeqvcxryy_0&s=igCc3c9_oiv223x3DWQm-ylJ79F8VPsTDm4Rd4hGY1E&e=.

davidschober commented 7 years ago

@MetadataKaren it looks like you got what you need. Is it OK if I close this?

MetadataKaren commented 7 years ago

@davidschober, yes, this can be closed. I am curious, though, about the "a (mostly) automated way to grab these in the future " that @bmquinn and @kdid are working on.

kdid commented 7 years ago

@MetadataKaren - coming soon. We've not yet pushed that to production. We will update you soon.

MetadataKaren commented 7 years ago

Excellent! I'm looking forward to it. Thank you for the update!

MetadataKaren commented 7 years ago

Oops, I did not mean to close this! Sorry about that.

kdid commented 7 years ago

We will go over at the meeting today.

https://github.com/nulib/repodev_planning_and_docs/wiki/Public-Collections-VRA-and-Thumbnail-Access

davidschober commented 7 years ago

Closing, as the meeting happened and all seem OK.