sul-dlss / stanford-arclight

Stanford-specific ArcLight app
https://archives.stanford.edu
Other
2 stars 1 forks source link

Can we integrate with Aeon in the same manner as we do in OAC? #453

Closed corylown closed 6 months ago

taylor-steve commented 6 months ago

This seems very possible to do by using the same "internal" setup as OAC as described here: https://support.atlas-sys.com/hc/en-us/articles/360011919533-Submitting-Requests-via-EAD-Finding-Aids. My current thoughts on approach are three steps:

1. Modify ArchivesSpace EAD XML to match the format expected by current OAC Aeon process

The finding aid XML from ASpace needs slight modifications to match the format the OAC Aeon XSLT file expects. This seems to be minor at this point, some namespace and case changes (e.g., type="folder" to type="Folder").

Currently blocked on this step. A request has been placed with Atlas for a copy of our current OAC aeon.xslt file or to be pointed to who can access it. Once we have this, determining the rest of the modifications should be straightforward. I've tested with a sample aeon.xslt file and both OAC and ASpace generate the same output with the mentioned modifications, so I am optimistic.

Note: There is risk here that I'm missing a step but reading that linked Atlas document I feel confident that it should support this.

2. Enable EAD downloads in stanford-arclight.

We need publicly accessible EAD files for Aeon. Arclight supports buidling EAD urls by configuring the ead template setting in config/downloads.yml (e.g., http://localhost:3000/public/eads/%{unitid}.xml).

We'll need to either:

We will need to decide if we are fine exposing the modified XML files to all or if we want to expose unmodified ASpace XML for users to download and have a separate download path for the modified files for Aeon.

3. Enable Aeon requesting in stanford-arclight.

Arclight supports Aeon requests in the same fashion as we are using for OAC. This is configurable per repository in the config/repositories.yml file:

  request_types:
    aeon_web_ead:
      request_url: 'https://sample.request.com'
      request_mappings: 'Action=10&Form=31&Value=ead_url'

We'd use our current Stanford OAC url for request_url. ead_url currently gets replaced by the value generated for the EAD download in step 2. If we decide in step 2 to not serve the modified EADs to all, we might need to override some of this functionality to add a new template variable or modify ead_url in this situation.

taylor-steve commented 6 months ago

I now have access to the aeon.xslt file being used in prod. It seems all we need to do to convert ASpace EADs into a format acceptable for the OAC Aeon setup is:

I haven't actually tried to submit a request, but the rendered forms are exactly the same between ASpace and OAC, with the exception of some minor data differences where presumably ASpace has diverged from what was loaded in OAC.

taylor-steve commented 6 months ago

Here's the current aeon.xslt file from AtlasSystems-Prod/hosting-aeon-stanford for the current OAC request workflow: https://gist.github.com/taylor-steve/1e04d1f31958b398e42bb2b363c1b930

Here's how I'm currently converting c elements to cXX elements: https://gist.github.com/taylor-steve/5afb65e43580bf96716b28628ee87c4d

Namespaces I'm stripping in Nokogiri.

taylor-steve commented 6 months ago

A decision has been made to first attempt modifying the EAD XML locally in ArcLight@SUL to match the format expected by our current aeon.xslt file used for requesting in OAC. As part of this issue, we also researched what it would take modify the workflow on the Aeon side to remove the need to make local customizations to ArcLight@SUL. The following is a summary of what we found, should this work be picked up again.

The current aeon.xslt file is not configured to handle the default EAD namespace. OAC appears to strip namespaces as part of the submission process: https://github.com/cdlib/dsc-oac-voro/blob/a46592a5e971dcdaf5ce62a5bd0ddd4ecba7dca3/xslt/Remove-Namespaces.xsl

The previously mentioned issue of unnumbered c elements has been resolved by enabling the numbered_cs option when fetching the EAD from ASpace: https://github.com/sul-dlss/stanford-arclight/pull/499

I have created a second aeon.xslt file that does use the default namespace: https://gist.github.com/taylor-steve/377627777849dc199a65c135e298d764 It has been spot tested with a handful of EADS but would need to be tested more rigorously before being put into production.

Aeon can be configured to use multiple XSLT files, mapping to a specific file based on the content of the EAD XML: https://support.atlas-sys.com/hc/en-us/articles/1500002904302-The-EADMapping-Table

Our intent was to add the second aeon.xslt file (e.g., aeon_with_namespaces.xslt) and use that mapping table to direct ASpace EADs to it, leaving OAC EADs to be processed by the original aeon.xslt.

There are two Aeon GitHub repositories. Access can be requested from Atlas. I was quickly added after filing a support ticket with them. Changes to the aeon.xslt files are made by making pull requests and working with Atlas to get them merged. Atlas support offered to walk us through the process or there is documentation on their website. https://github.com/AtlasSystems/hosting-aeon-stanford https://github.com/AtlasSystems-Prod/hosting-aeon-stanford