While archiving library.stanford.edu with browsertrix-crawler we discovered that embedded SDR viewers on blog posts don’t display and give an error, see:
While a crawl of the same page appears to work in ArchiveIt, it is showing the embeds from the live web rather than the capture (which those resources are missing from). Here’s what the browser dev tools Network panel looks like when viewing the Archive-it capture:
While some of the embedded iframe has been captured, looks like maybe some critical resources for rendering were not? For example: https://purl.stanford.edu/pq546tq4448/iiif/manifest is not going through swap. These resources are not loaded by the browser on page load, but only when they scroll into view:
While archiving library.stanford.edu with browsertrix-crawler we discovered that embedded SDR viewers on blog posts don’t display and give an error, see:
https://swap.stanford.edu/was/20230505152659/https://library.stanford.edu/blogs/special-collections-unbound/2022/11/born-digital-collections-opened-research-2022
While a crawl of the same page appears to work in ArchiveIt, it is showing the embeds from the live web rather than the capture (which those resources are missing from). Here’s what the browser dev tools Network panel looks like when viewing the Archive-it capture:
While some of the embedded iframe has been captured, looks like maybe some critical resources for rendering were not? For example: https://purl.stanford.edu/pq546tq4448/iiif/manifest is not going through swap. These resources are not loaded by the browser on page load, but only when they scroll into view:
https://github.com/sul-dlss/was-pywb/assets/33829/b7a145e0-62dc-4fa6-9214-bf939a8abd0f
So it appears that browsertrix-crawler was not configured to scroll the page?