Harvest full archival descriptions (in EAD / ArchivesSpace) for collections in RB and IA

flapka commented 4 years ago

For milestone 4 (after initial release):

ArchivesSpace is the data source for full description of archival collections in Rare Books and Manuscripts and Institutional Archives. The native data format for these descriptions is EAD (Encoded Archival Description). Unlike library/archival material described in MARC (Voyager), these descriptions are not currently provided in a YUL OAI feed. The optimum method for harvesting the EAD is TBD, but one possibility is the existing GitHub repository: https://github.com/YaleArchivesSpace/Archives-at-Yale-EAD3

We obviously need additional discussion to determine the precise manner for integration of archival descriptions -- in terms of both the desired outcomes in the catalog and the processes for reaching those outcomes. This GitHub issue is intended as a log for documenting our progress with those discussions.

On the question of outcomes, Emmanuelle noted the Georgia O'Keeffe Museum's integration of archival material in its collection catalog: https://collections.okeeffemuseum.org/object/ It's not immediately clear whether those archival collections are described in EAD (though that's perhaps irrelevant).

The Smithsonian is another example. Their collections catalog includes material from multiple archival repositories, including the SI's Institution Archives: https://collections.si.edu/search/results.htm?q=paul+mellon

Neither of the above implementations is necessarily the ideal.

flapka commented 3 years ago

@edgartdata @rchatalbash @yulgit1 Following this morning's discussion, I'll offer some suggestions on next steps.

I think the treatment of archival collections in LUX can inform our work to a large extent, as the desired outcomes are generally the same: i.e. expose the entire depth of finding aid descriptions by atomizing all the components of a finding aid description (1 finding aid = many object descriptions/records). But since the underlying LUX schema varies in significant ways from the YCBA schema, I suggest it'd be most useful to use our test Blacklight Test instance as a development sandbox.

For data stream, I believe YUL pushes nightly updates to GitHub: https://github.com/YaleArchivesSpace/Archives-at-Yale-EAD3. Each finding aid is represented by an EAD-encoded XML file. YCBA finding aids are currently sorted into two folders. These are a few details (among many) that we should confirm with Mark Custer before proceeding. @yulgit1 Does this look like an actionable data source from your point of view?

For data transformation (EAD to YCBA schema), I imagine we needn't start from scratch. It's unclear whether the archives-to-LUX transformation could serve as the best jumping off point. Another question for Mark. If not, it's possible that existing XSL (such as Mark's EAD-to-MODS: https://github.com/fordmadox/EAD-to-MODS/blob/master/yale.ead2mods-allLevels-plusContextTree.xsl) might point us strongly in the right direction. Regardless, I'd be happy to spend significant time clarifying the desired mapping to our YCBA schema.

If we're in agreement on these broad goals and mechanisms, perhaps it'd be useful to schedule a Zoom meeting between our interested parties and Mark?

yulgit1 commented 3 years ago

@flapka If I understand you correctly you are proposing blacklight go beyond serving the 41 high level IA collections and offer records for the 742 Archival Records components? And do the same for RB archives?

YCBA IA

To do this I imagine a MODS->BL Solr Workflow, given MODS availability via harvester or activity stream. A question for Mark Custer perhaps.

flapka commented 3 years ago

@yulgit1 Yes, that's correct. There are currently 742 for IA, and 8631 for RB.

flapka commented 3 years ago

@yulgit1 @rchatalbash

In a quick chat this afternoon, @edgartdata floated the intriguing idea of using LUX as a data source for YCBA archives. This would satisfy our needs in Blacklight if:

We are content with the conversion/fidelity of archives data as transformed to the LUX schema
There is a mechanism to use LUX as a data stream (filtered to YCBA material delivered from ArchivesSpace to LUX)

@yulgit1 What's your take on the 2nd question? Given the schema differences between LUX and the YCBA catalog, we'd still need to develop a transformation, but perhaps this method would be a lighter lift?

yulgit1 commented 3 years ago

Sounds like a good idea. Given (1), seems (2) (a LUX-solr to BL-solr mapping) is a viable option, and preferable to a MODS conversion. And given the primacy of LUX I have no qualms having this dependency either.

flapka commented 3 years ago

@yulgit1 I'd be happy to work with @rchatalbash and Cate on a LUX-to-YCBA mapping. Would a spreadsheet be the most useful means to document our intentions? If so, could you provide a spreadsheet with rows for all of the YCBA-Blacklight fields?

yulgit1 commented 3 years ago

@flapka ok- first let me check with Harry that the solr index is available.

yulgit1 commented 3 years ago

From email w/ Harry, there is an endpoint:

http://luxpoc.collections.yale.edu:8983/solr/#/blacklight-core/core-overview

But it is unstable and he advises against using it. @flapka maybe hold off for now?

flapka commented 3 years ago

@yulgit1 That's good to know. Does Harry advise against using it in general, or just while LUX is in development?

yulgit1 commented 3 years ago

I think just while LUX is in development.

"We should discuss with Sarah and Jeff to see if this would be feasible for LUX 1.0."

flapka commented 3 years ago

Ah. Shall we discuss possible strategies in our next Blacklight meeting?

I am at least slightly wary of the added layer of dependencies embodied in an ASpace-->LUX-->YCBA data stream. When the YCBA catalog was previously based on ODAI data, things began to break (for library data) as ODAI became more-or-less obsolete, understaffed, and minimally responsive to the evolution of library data. A repeat of that experience would be undesirable.

yulgit1 commented 3 years ago

I don't think the alternative, ASpace-->MODS-->YCBA is any better, perhaps worse as there would have to be a delivery mechanism for the MODS. And does MODS exist without LUX and/or vice-versa?

Is there a different strategy?

flapka commented 3 years ago

(Naively) I assume LUX receives archives data without a MODS intermediary. I don't know the details of that mechanism, but I assume Mark does a direct transform.

Is the alternative strategy to employ a similarly direct EAD/XML --> Blacklight transform? I think Mark has done a good job with the transform for LUX, so perhaps we could recycle (modify) more of Mark's work than I had previously suggested.

yulgit1 commented 3 years ago

@flapka - see spreadsheet of fields below.

Asterisks mean that post index field processing occurs so what's rendered is not necessary in the field. When figuring out the mapping I'd suggest looking at the .json item pages.

Rows 51 and below are indexed but never used with the exception of those marked in the "backend" column Blacklight Fields

flapka commented 3 years ago

@yulgit1 @edgartdata @rchatalbash

This is Mark's method for EAD-to-LUX: https://github.com/fordmadox/EAD-to-LUX

There is some complexity.

flapka commented 3 years ago

@yulgit1 @edgartdata @rchatalbash

Two things:

@yulgit1 I've requested access to the spreadsheet you've linked to in your message above.
Returning to the feasibility of the LUX endpoint as a source for archives (ASpace) data: If its primary flaw is instability (for the time being), might we make do with infrequent harvests from it while developing archives integration within our Blacklight environment? Happy to discuss the flaws of this idea in our Friday meeting.

flapka commented 3 years ago

Summary of discussion on November 5:

The group considered the merits of two possible paths for archives data:

Native EAD --> new data transformation --> YCBA Solr index
Native EAD --> data transformation designed for LUX --> LUX Solr index --> lightweight data transformation (to be developed) --> YCBA Solr index

Given the complexity of the data transformation required by option 1, our consensus is that the 2nd option is the more viable approach. Though we have small concern about the added data dependency of that approach, the concern is diminished by confidence that the EAD-to-LUX infrastructure (and support) will remain robust, largely because of YUL’s investment in making it so. If, contrary to expectations, we determine that the dependency does not fulfill YCBA’s functional requirements, we will revisit the 1st option.

YCBA’s retrieval of archives data from LUX would occur at the farthest possible point up stream in the LUX Solr index (e.g. before additional data manipulation by Harry S.), to minimize the possibility of complications/failures caused by the workings of LUX systems.

LUX data structures are scheduled to achieve relative stability by July 1, 2021. Incorporation of archives data in the YCBA production Blacklight cannot occur before that date, though development and planning work can begin before then.

Francis anticipates that most archives data provided to LUX should translate well to existing fields in the YCBA catalog. However, the nature of archives data will require implementation of YCBA fields to fuller represent object relationships and hierarchies. This development should be coordinated with corresponding work for the art and library (MARC) collections.

It’s also possible that data requirements unique to YCBA archives will require (infrequent) modifications to the EAD --> LUX data transformation.

Initial steps:

Francis will invite Mark Custer to a subsequent Friday morning meeting to discuss the viability of these plans. If that discussion encourages the planned approach, share the plan with Harry Shyket and CHITA to make sure that they know of our dependency on LUX data. Once they know there is a use case for such reuse of LUX data, they might have suggestions to support it now and in the future.
Rachel, Cate, and Francis will work on the lightweight data mapping, LUX Solr index --> YCBA Solr index.
When Eric’s time is available, implement the data mapping (step 2) to YCBA test Blacklight. Refine, etc.

flapka commented 3 years ago

A summary of options and questions following additional emails from Mark Custer and Michael Appleby, to discuss in our upcoming meeting about archives and Metadata Cloud:

--

Data stream possibilities:

[current setup]

Native EAD --> new transformation (atomization) --> YCBA Solr index

Native EAD --> transformation (atomization) designed for LUX --> LUX Solr index --> transformation --> YCBA Solr index

[future setup, with Metadata Cloud]

Native EAD --> transformation (atomization) via Metadata Cloud --> transformation --> LUX Solr index --> transformation --> YCBA Solr index

Native EAD --> transformation (atomization) via Metadata Cloud  transformation --> YCBA Solr index

If Metadata Cloud will indeed be the source of archives data down the road, YCBA can choose between options 1, 3, and 4. I assume options 3 and 4 are immediately more appealing because the data made available via Metadata Cloud will have already atomized the EAD files – i.e. converted a single complex (hierarchal) archival description into its component parts, which seems the most difficult part of the transformation work. If that is indeed true, and if the MC serialization is complete and correct, I lean towards a preference for option 4, as it eliminates the LUX dependency and leaves us with only a lightweight (?) transformation requirement for mapping to our YCBA Solr index.

Questions for Mark and Michael (about the archives data stream in Metadata Cloud):

What do you see as YCBA’s most sensible data stream option?
In MC, what’s the method for harvesting sets of archival data (from a given repository)?
What data formats are available from MC?
Will MC express every piece of data present in ASpace?
Will MC express ASpace data describing entities other than archival resources/components – especially digital objects, subjects, and agents?
If “deletes are not supported” (per Michael’s earlier email), do we need to develop a customized method for deleting objects from our YCBA Solr index (similar to what we have now)?
What’s the earliest date that we might expect to use MC as the source of archives data (if we choose that path)?

Questions for Michael concerning MC as source for Voyager data:

Is MC a production-ready source of Voyager data now?
Does it support deletes?
Is there documentation about how to work with it?
Could we see example BIB / Holdings data in MC? Even better, are there live example URLs we could look at?

flapka commented 3 years ago

The meeting on 12-14-2020 with Michael and Mark did not produce a clear preferred path for consumption of archives data. There was discussion of whether we might benefit from data enhancements provided farther down the data pipeline in LUX. We agreed to revisit this evaluation in late spring/early summer, 2021.

I'd like to hear the major takeaways from CIA & Friends attendees, perhaps in our next meeting (Jan. 15?).

flapka commented 3 years ago

Email from Mark Custer, December 17:

All,

I wanted to send a quick follow up note after our conversation this week. I'm not sure if we answered all your original questions, Francis, but hopefully we nevertheless covered some useful ground and sparked some ideas.

One thing that I really want to make sure that I convey, though, upon reflection from our meeting, is that you have full and complete access to YCBA's data in ArchivesSpace. Anyone at YCBA can get an ArchivesSpace user account (as Rachel, Francis, Cate, and others already have) and with that user account you get access to the staff interface as well as to the Application Programming Interface (API), which simply means that you've got lots of options.

So, just to make this point clear since I think that I neglected to do so when we met: I still think that the easiest option would be to get the data from LUX or YUL's Metadata Cloud, but if you want to have full control over the process (even just to experiment), as it sounds like you do now with getting the MARC records, then you can get your data directly from ArchivesSpace using a variety of different options. I'd certainly recommend using the existing Export Service (https://github.com/hudmol/archivesspace_export_service), whether piggybacking on our current one or spinning up your own, but even without using that full pipeline, you already have access to the "resource-update-feed" API endpoint that the Export Service also uses. That API endpoint will give you a list of updated or deleted finding aids for any time period that you want (e.g. our current pipeline just askes for what's changed in the previous day since it runs on a nightly basis).

Either way, in January, I can start sending Eric copies of the JSON files that we're sending to LUX during the proof-of-concept phase of that project. I'm not sure exactly what your Solr schema looks like, but I also suspect it wouldn't be very difficult to go directly from EAD to that, or even directly from ASpace's JSON to that format, etc.

Email response from Eric James, December 17

Thanks for all of this Mark. One approach caught my eye within the AS export service with EAD directly, “For each resource in the adds list it exports it as EAD and then places it in a pipeline for processing. The pipeline for each job is configured in config/jobs.rb. It might include steps such as EAD validation, XSLT transformation, and ultimately a publication step.”

flapka commented 2 years ago

[Email exchange, October 2021]:

On Oct 8, 2021, at 10:39 AM, Lapka, Francis francis.lapka@yale.edu wrote:

Mark, Michael,

My YCBA colleagues and I are thinking again about techniques for integrating archival descriptions into our YCBA catalog, though implementation is not imminent (FY23?). When we met in December (2020), we articulated three options:

ArchivesSpace API
Metadata Cloud
LUX as data endpoint (with enrichments / alterations added by LUX)

Intuitively, the Metadata Cloud option looks most appealing to me, as it seems a nice middle ground: it already includes useful (?) transformations, compared to the raw AS API data, without the additional dependencies and modifications introduced by LUX. But that assessment is based on an incomplete understanding of how the archives data is actually rendered in that Metadata Cloud.

With apologies that the following questions may repeat what I’ve asked before: • Is archives data in Metadata Cloud now full, stable, and a viable candidate for consumption? • Regardless, are there URIs that you could share that demonstrate how archives data is surfaced in that context? (If not, screenshots, or some other representation?)

Thanks, Francis

From: Appleby, Michael michael.appleby@yale.edu Sent: Thursday, October 14, 2021 1:22 PM To: Lapka, Francis francis.lapka@yale.edu Cc: Custer, Mark mark.custer@yale.edu; James, Eric eric.james@yale.edu; Delmas-Glass, Emmanuelle emmanuelle.delmas-glass@yale.edu; Chatalbash, Rachel rachel.chatalbash@yale.edu Subject: Re: archives, YCBA Blacklight, Metadata Cloud discussion

Hi Francis,

On a technical level Metadata Cloud has been performing well and is in production - it provides all of the descriptive metadata for https://collections.library.yale.edu. I think Mark is the one who could speak to whether the data provided by MC will meet your requirements.

We are still in the process of migrating content to the new site and are also adding features to the UI. For instance we recently added a facet for archival collection and a display of the archival hierarchy for items that draw there metadata from ASpace. If you want to see this in action, facet on Repository “Beinecke Rare Book and Manuscript Library” and you will then see a Collection Title facet appear. Here is a search that returns the Langston Hughes collection: https://collections.library.yale.edu/catalog?f%5Bcollection_title_ssi%5D%5B%5D=Langston+Hughes+papers+%28JWJ+MSS+26%29&f%5Brepository_ssi%5D%5B%5D=Beinecke+Rare+Book+and+Manuscript+Library&per_page=10&search_field=all_fields. Items are ordered according to their order in ASpace. You can use the hierarchy/“breadcrumb” links to scope your search to any level in the collection.

While I would be happy to have you use Metadata Cloud there would be a caveat that we are still making minor changes. Depending on your implementation timeline this might not be an issue, we’ve already done a lot of the UI work which helped us sort out what data was required. Another thing we’d need to do is run this by the “service” side of Library IT, since Metadata Cloud has been for internal use only we skipped over some of the support/SLA documentation that we would produce for a more visible service. So I’d want to run this by our leadership team in LIT before we commit.

Best, Michael

From: Lapka, Francis Sent: Wednesday, October 20, 2021 9:49 AM To: Appleby, Michael michael.appleby@yale.edu Cc: Custer, Mark mark.custer@yale.edu; James, Eric eric.james@yale.edu; Delmas-Glass, Emmanuelle emmanuelle.delmas-glass@yale.edu; Chatalbash, Rachel rachel.chatalbash@yale.edu Subject: RE: archives, YCBA Blacklight, Metadata Cloud discussion

Thanks Michael,

Comments, sorted by topic:

Data provided by Metadata Cloud: Mark, I’m keen to hear if MC omits any significant data elements describing archival objects/records. In the Digital Collections examples, the results look rather complete.
Digital Collections comparison: I think the MC->Digital Collections example is a useful parallel, as YCBA’s consumption and display of archives data could be similar. We might even hope to borrow some elements of the Digital Collections display – such as the context tree for archival objects.
MC, internal only or visible service: I’m interested in MC as a possible source of ASpace data and Voyager data – the latter because the libapp data service omits multiple important data elements from MARC holdings records (elements that I have reason to believe have been/will be implemented in MC). If we can confirm that MC provides all the data that YCBA would like to have in our Blacklight catalog, I’d be quite keen for it to be a visible service.
YCBA timeline: We envision an implementation of archives data in the YCBA catalog no sooner than FY23. Nonetheless, we’d like to begin planning/mapping in the present fiscal year. Thanks again,

Francis

flapka commented 2 years ago

[Email from Mark Custer, Nov. 4, 2021, responding to the above]

Francis,

There are still a few refinements that we'll need to make for the Metadata Cloud (MC) mappings (both from ArchivesSpace and Voyager) before I would consider them stable for the use that you describe, but I think that those should be in place soon enough. So, based on your timeline, I'd say that using the MC should be a pretty straightforward option, as long as the leadership team in LIT allows the MC to be used for that purpose, as Michael mentioned.

I'm about to head out on vacation from 11/4 - 11/14, but I'd be happy to meet at any point after 11/15 to discuss the mappings in more depth.

Mark

flapka commented 2 years ago

Following the latest update from Mark Custer (above), I'd like the group's quick take on two questions:

Would we like to request that YUL make Metadata Cloud available as a visible service?
Even if MC becomes a viable data stream for us eventually, it's unlikely to happen until FY23 (right?). Do we have expectations from YCBA leadership to implement an interim solution before then, for at least some archival collections? (See related discussion in issue #328).

flapka commented 2 years ago

@yulgit1 @edgartdata @rchatalbash

Rephrasing my first question above, can we think of any reason not to ask YUL to make Metadata Cloud available as service (visible to external units)? If we were to do so, I believe our elaborations would be as follows:

We envision MC as the most viable data stream for integrating YCBA ArchivesSpace data, taking advantage of MC's transformation of finding aids into component parts
We envision MC as a replacement for YUL's OAI (libapp) service for retrieving YCBA Voyager data, especially if MC provides more complete MARC Holdings data compared to the OAI service.

Assumptions:

MC does indeed contain all the data needed (there's suggestion that it does, but it's impossible to confirm without actually seeing it)
YCBA record sets can be isolated in MC
MC allows occasional full harvests of YCBA record sets and allows us to isolate/retrieve records that have been added or edited. A mechanism allowing us to identify records that have been deleted would be a bonus (this isn't possible in OAI/libapp).
The service would be made visible/available by FY23

Does that sound right to you all? Other thoughts? Should we formalize this request to Michael Appleby?

flapka commented 2 years ago

Email sent 2021 Nov 19:

From: Lapka, Francis Sent: Friday, November 19, 2021 10:52 AM To: Appleby, Michael michael.appleby@yale.edu; Custer, Mark mark.custer@yale.edu Cc: James, Eric eric.james@yale.edu; Delmas-Glass, Emmanuelle emmanuelle.delmas-glass@yale.edu; Chatalbash, Rachel rachel.chatalbash@yale.edu Subject: RE: archives, YCBA Blacklight, Metadata Cloud discussion

Michael,

In a meeting of the Center’s online collections group this morning, we had consensus that we would like to (formally) request for YUL’s Metadata Cloud to be made available as a visible service – visible at least to YCBA data harvesting processes, in a manner that creates a minimum of work for YUL IT.

This request is based on the following aspirations:

We envision MC as the preferred data stream for integrating YCBA ArchivesSpace data, taking advantage of MC's transformation of finding aids into component parts.
We envision MC as a replacement for YUL's OAI (libapp) service for retrieving YCBA Voyager data, especially if MC provides more complete MARC Holdings data compared to the OAI service.

Viability depends on these criteria:

• MC does indeed contain all the data needed (our conversations to date suggest that it probably does, but it's impossible to confirm without actually seeing the data) • YCBA record sets can be isolated in MC • MC allows occasional full harvests of YCBA record sets and allows us to isolate/retrieve records that have been added or edited in a given interval. A mechanism allowing us to identify records that have been deleted would also be desirable (this isn't possible in OAI/libapp). • The service would be made visible to YCBA by FY23

Would it be reasonable at this juncture to pass the request to the LIT leadership team? I wonder also if there’s a way that we could see examples of the data – to give us a clearer sense of whether it would be a viable data source, before moving forward with the request.

Keen to hear your thoughts or concerns,

Francis

flapka commented 2 years ago

[Email sent 2022 March 31]

From: Appleby, Michael michael.appleby@yale.edu Sent: Thursday, March 31, 2022 3:03 PM To: Lapka, Francis francis.lapka@yale.edu Cc: Custer, Mark mark.custer@yale.edu; James, Eric eric.james@yale.edu; Delmas-Glass, Emmanuelle emmanuelle.delmas-glass@yale.edu; Chatalbash, Rachel rachel.chatalbash@yale.edu Subject: Re: Metadata Cloud, today's presentation (YCBA)

Hi Francis,

Apologies for the long-delayed reply. Some MC development would be required to meet the needs you have outlined:

For a given resource MC will provide the list of parents up to the repository level, but not the children. For Ladybird records we have a parameter to request that the child records be included in the response, I imagine we would add the same functionality for the MC ASpace API endpoint.
MC does not return the repository-level record. I think this would be needed to allow traversal of the full set of YCBA records.

I would be happy to meet in the near future to review requirements. LIT would then need to develop an estimate for the work. If the development time required is under a week then it should not require a formal project plan or ITSC approval, but I’d need to confirm that with Dale.

Best, Michael

flapka commented 2 years ago

Notes from Metadata Cloud / YCBA discussion, May 31, 2022 Attendees: Michael Appleby, Rachel Chatalbash, Mark Custer, Emmanuelle Delmas-Glass, Eric James, Francis Lapka

Summary of goals, if viable:

Harvest YCBA archives (AS) data from Metadata Cloud, for outcomes that are similar to archives data in YUL digital collections and (eventually) LUX
Harvest YCBA MARC (Voyager) data from Metadata Cloud, if MC provides Holdings data that’s more complete than what’s offered via OAI/libapp

Questions / notes

What routines do we envision for harvesting (and regularly updating) complete archives (or MARC) data from MC?
- Archives:
  - In MC, YCBA archives in ArchivesSpace can be easily isolated via our numerical repository code at the root of resource URIs, i.e. (at present) https://archives.yale.edu/repositories/2 for RBM and https://archives.yale.edu/repositories/3 for Archives
  - As YCBA proceeds with consolidation of archival collections, we anticipate that we will soon have a single repository code.
  - Our goal is to harvest every level of hierarchy in archive descriptions. The harvest could do this by working our way down from the root (collection-level) node
  - Voyager (MARC): A query against MC's “repository” field will isolate all YCBA records in Voyager
What challenges or remain to be resolved?
What are the next steps?
- Confirm that the data we want is there (June)
  - Consult mapping document: https://metadata-api.library.yale.edu/metadatacloud/public/index.html
  - Send list of names / net IDs of YCBA folks needing access to the Metadata Cloud API Debugger interface (to facilitate data mapping review) [update 6/22: Francis and Eric have been given access]
- Outline development work / project plan – is sure to be over 40 hours
- Project plan would need approval from the YUL IT steering committee (at least)
- We would also need to establish a YCBA-YUL service agreement, which among other things would describe how service changes desired by YCBA might be handled. YCBA already has a similar service agreement in place for our use of Aeon.
What contributions are needed from YCBA to proceed?
- Review MC data. Document data requirements not met by MC, if any.
- Develop and test data consumption routines
Can we envision a timeframe in FY23?
- Michael’s team should have time available to work on this in late summer or fall (2022)

flapka commented 2 years ago

I've started to review data and mappings provided via the Metadata Cloud service. It still looks a viable data stream, but my review so far suggests:

The mapping work will be more time-intensive than previously envisioned;
There are a number of decisions to be made that should involve input from a YCBA archivist (much more than @rchatalbash has time to do right now).

With the above in mind, Rachel and I agree that it would be sensible to hit pause on this until we've hired an archivist.

yulgit1 commented 2 months ago

On integrating images into YCBA finding aids: https://github.com/ycba-cia/blacklight-collections2/issues/328

yulgit1 commented 2 months ago

From: Francis Lapka [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu) Date: Thursday, April 20, 2023 at 9:08 AM To: "Chatalbash, Rachel" [rachel.chatalbash@yale.edu](mailto:rachel.chatalbash@yale.edu), "Quagliaroli, Jessica" [jessica.quagliaroli@yale.edu](mailto:jessica.quagliaroli@yale.edu), "Rinn, Meg" [meg.rinn@yale.edu](mailto:meg.rinn@yale.edu) Cc: Emmanuelle Delmas-Glass [emmanuelle.delmas-glass@yale.edu](mailto:emmanuelle.delmas-glass@yale.edu), "eric.james@yale.edu" [eric.james@yale.edu](mailto:eric.james@yale.edu) Subject: RE: YCBA Finding Aids -- Wednesday's meeting

All,

Some quick addenda to (or summaries for) yesterday’s discussion.

The three data sources under consideration, so far, are:

The native EAD, via the ArchivesSpace Export Service, https://github.com/hudmol/archivesspace_export_service. Pro: Gives us complete control for data mapping. Con: Would require us to write our own data transformation.
YUL’s MetadataCloud. Pro: The EAD data is already transformed into a set of records representing all of the resources & archival objects. Con: In some places, the data transformations may not perfectly align with YCBA catalog use cases.
The LUX index for records from ArchivesSpace. Pro/Con: similar to MetadataCloud, but my sense is that the data transformed for LUX may be even farther away from the source data (compared to MC). But you might want to check with Alicia Detelich to be sure.

To evaluate the MetadataCloud mappings, YUL has pointed us to their Metadata Cloud API Debugger. Everyone should have access to this: https://metadata-api-uat.library.yale.edu/metadatacloud/public/index.html.

To see how specific resources/objects are transformed in MC, you need login credentials, which Eric and I currently have. I’m sure Jessica and Meg could be added too, if we request it (through Michael Appleby & Martin Lovell). I attach two example screenshots of what you can see with that login.

For the YCBA catalog, I believe all the fields are represented in this document (Eric can correct me if I’m wrong): https://docs.google.com/spreadsheets/d/1aZ6s3re_kT5NlFmc2UPEsbjq_l0x5b0BMydGre36vOg/edit?gid=0#gid=0

Using that as a base, my quite tentative MetadataCloud-to-YCBA mapping for archives is found in this document, which should be taken with many grains of salt (plus I didn’t get very far): https://docs.google.com/spreadsheets/d/1k4PlKdy2AEjbG8wKfFezyHcSD-DPwCnLv6XK42uXeO8/edit?gid=0#gid=0

Francis

yulgit1 commented 2 months ago

From: Lovell, Martin [martin.lovell@yale.edu](mailto:martin.lovell@yale.edu) Sent: Friday, July 15, 2022 4:48 PM To: Lapka, Francis [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu) Cc: Appleby, Michael [michael.appleby@yale.edu](mailto:michael.appleby@yale.edu); James, Eric [eric.james@yale.edu](mailto:eric.james@yale.edu) Subject: Re: today's Metadata Cloud / YCBA discussion

For that particular oid (10731241), it looks like DCS is still pointing to ladybird. So: https://metadata-api.library.yale.edu/metadatacloud/api/1.0.1/ladybird/oid/10731241?include-children=1&mediaType=json instead of https://metadata-api.library.yale.edu/metadatacloud/api/1.0.1/ils/barcode/39002130055156?bib=9730466&mediaType=json

If it were changed to point to Voyager instead of ladybird, it would update the genre facets.

Martin

On Jul 15, 2022, at 3:18 PM, Lapka, Francis [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu) wrote:

Michael, Martin,

I’m just beginning to tip a toe into the Metadata Cloud API Debugger tool. For the moment, I’m comparing:

/ils/barcode/39002130055156?bib=9730466 [and] https://collections.library.yale.edu/catalog/10731241

If Metadata Cloud is the source of data for the latter, could you tell me the data serialization that the DCS uses to harvest it? I ask because clearly the API Debugger view of data suppresses some of the mark-up, e.g.: API Debugger has: Stipple engravings England London 1790 Which gets rendered in DCS as: Stipple engravings -- England -- London – 1790

That markup sometimes is essential to how the data maps or displays. To fully evaluate MC as a possible source of data for the YCBA catalog, I’d need to see the native serialization(s), if that’s the right term. Would that be possible?

Francis

From: Appleby, Michael [michael.appleby@yale.edu](mailto:michael.appleby@yale.edu) Sent: Monday, June 13, 2022 10:58 AM To: Lapka, Francis [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu) Cc: James, Eric [eric.james@yale.edu](mailto:eric.james@yale.edu); Lovell, Martin [martin.lovell@yale.edu](mailto:martin.lovell@yale.edu) Subject: Re: today's Metadata Cloud / YCBA discussion

Hi Francis, Eric,

Martin has added the accounts to the credentials file and they will be created in production during our Wednesday deployment window. He can send you the credentials.

Best, Michael -- Michael Appleby Director of Software Engineering Library Information Technology Yale University Library 130 Wall St. Room 606 New Haven, CT 06520

michael.appleby@yale.edu

On Jun 10, 2022, at 10:16 AM, Lapka, Francis [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu) wrote:

Hi Michael.

Might it still be possible for Eric and I to be granted access to the MC API debugger? Net IDs are ermadmix and fl232respectively.

Thanks! Francis

From: James, Eric [eric.james@yale.edu](mailto:eric.james@yale.edu) Sent: Wednesday, June 1, 2022 2:36 PM To: Custer, Mark [mark.custer@yale.edu](mailto:mark.custer@yale.edu); Lapka, Francis [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu); Appleby, Michael [michael.appleby@yale.edu](mailto:michael.appleby@yale.edu); Chatalbash, Rachel [rachel.chatalbash@yale.edu](mailto:rachel.chatalbash@yale.edu); Delmas-Glass, Emmanuelle [emmanuelle.delmas-glass@yale.edu](mailto:emmanuelle.delmas-glass@yale.edu) Subject: Re: today's Metadata Cloud / YCBA discussion

Thanks Marc. Thinking about the EAD option I guess you could parse the github raw of the Readme for the EAD ID, and construct the github URL for each EAD from that. Or is there a better way? And are there existing EAD XSLTS that output solr docs for each archival object (I’m assuming that’s what we want in blacklight solr)?

https://raw.githubusercontent.com/YaleArchivesSpace/Archives-at-Yale-EAD3/master/ycba-ead/README.md

And maybe a question for Francis, should we even consider this, ie, would the EADs have all the metadata we’d want indexed or do we need specific metadata that are only in metadata cloud?

Eric James Software Engineer Yale Center for British Art 1080 Chapel Street, PO Box 208280 New Haven, CT 06520-8280 203-432-9411 | britishart.yale.edu

Explore the Center's exhibitions online.

Connect on Facebook, Twitter, Instagram, YouTube, and Snapchat @yalebritishart #YCBA

From: "Custer, Mark" [mark.custer@yale.edu](mailto:mark.custer@yale.edu) Date: Wednesday, June 1, 2022 at 12:47 PM To: Francis Lapka [francis.lapka@yale.edu](mailto:francis.lapka@yale.edu), "Appleby, Michael" [michael.appleby@yale.edu](mailto:michael.appleby@yale.edu), "Chatalbash, Rachel" [rachel.chatalbash@yale.edu](mailto:rachel.chatalbash@yale.edu), Emmanuelle Delmas-Glass [emmanuelle.delmas-glass@yale.edu](mailto:emmanuelle.delmas-glass@yale.edu), "eric.james@yale.edu" [eric.james@yale.edu](mailto:eric.james@yale.edu) Subject: Re: today's Metadata Cloud / YCBA discussion

Francis, all:

Sorry to slip in a few other questions at the end of yesterday's meeting, which I didn't articulate very well. I'll do my best to clarify here regarding the EAD option:

Since I was not sure if there was a desire or not to keep how the MARCXML records are processed exactly as they are now, rather than using the Metadata Cloud to re-architect the entire workflow, I was just mentioning that all your EAD XML files are kept up to date on a nightly basis in GitHub (see https://github.com/YaleArchivesSpace/Archives-at-Yale-EAD3/blob/master/ycba-ead/README.md and https://github.com/YaleArchivesSpace/Archives-at-Yale-EAD3/blob/master/ycba-ia-ead/README.md). Right now, that's just over 100 EAD input files, which would result in 6,633 Solr records (one for each published archival object and resource record in ASpace; and, you should eventually have the same number of Solr records in QuickSearch for those published objects in ASpace). There are a lot of unpublished archival objects, so the total count in ArchivesSpace for both YCBA repositories is 48,090 (130 Resource records + 47,960 archival components). Still, that's not many records to have to deal with, and re-processing an entire EAD file if just one archival object changed wouldn't take much longer than requesting a single change from an activity stream.

But, I could be wrong about not wanting to change how the MARC records are harvested, and perhaps this is an opportunity to re-conceptualize the whole pipeline. If so, then I would go with the Metadata Cloud route, or just piggyback on the YCBA-specific portion of the Solr index from QuickSearch, if that's permitted (e.g. the Beinecke's previous digital library relied upon the CCD Solr index that powered https://discover.odai.yale.edu/, as did other services, if I'm not mistaken).

Anyhow, once you start reviewing the ASpace Metadata Cloud mappings, let me know if there are any questions. There are still some pending mapping changes that need to be added (extent statements, for instance), and there might be a few confusing parts, like the archivalSort key. That field has been added so that users can sort a flattened list of Solr results in finding-aid order, but only when compared with other results from the same collection. For example, this level of description, https://archives.yale.edu/repositories/2/archival_objects/3240, has an archival sort value of "00001.00000". All that means is that the current Solr record is an archival object that occurs within the second grouping (00001 = 2) of that collection, as the first item (000000 = 1) in that grouping, which lines up with what you see in the "Navigate the collection" hierarchical view in Archives at Yale.

Best,

Mark

ycba-cia / blacklight-collections2

Harvest full archival descriptions (in EAD / ArchivesSpace) for collections in RB and IA #152