thegetty / quire

A multi-package repository for the Quire multiformat publishing framework
https://quire.getty.edu/
BSD 3-Clause "New" or "Revised" License
89 stars 10 forks source link

Image Directory through CDN: src URL is being converted to a relative URL #863

Open Erin-Cecele opened 9 months ago

Erin-Cecele commented 9 months ago

Discussed in https://github.com/thegetty/quire/discussions/861

Originally posted by **rrolonNelson** October 10, 2023 We are delivering our images from a CDN with images stored in a blob storage, but have encountered an issue where the src URL is being converted to a relative URL. config.yaml imageDir: 'https://blobcdn.nelson-atkins.org/wpmediablob/Starr' figures.yaml src: figs/5224_c1.png desired outcome https://blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png actual outcome https:/blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png which resolves to {URL of current page}/https:/blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png In build and preview, we have the same issue when you look at the image src URL but in preview, the images will be visible. (We are using the fix for subdirectories in a build where we had to change the file in the node_modules) Preview: ![preview-path-src](https://github.com/thegetty/quire/assets/103956133/3c8e16d9-2c75-4c7a-8e32-e1f58ec1896a) Build: ![build-path-src](https://github.com/thegetty/quire/assets/103956133/884e92f7-6664-41e1-a42a-8faee4ff418e) I think the issue comes from using path.join() which will remove any '//'. Based on the code in `image-tag.js` it looks like there is a conditional that checks if the src from figures.yaml starts with an HTTP. To resolve the issue in the light box and inline images we opted to have all the image sources in figures.yaml to have the absolute URL however this seems to break the images on the table of content cards since in `table-of-contents/item/image.js` the check for if the src starts with HTTP is not there so it will join the imageDir and src regardless of if it starts with an HTTP and make it a path also. thumbnail_Screen Shot 2023-10-04 at 3 50 07 PM thumbnail_Screen Shot 2023-10-04 at 3 50 42 PM config.yaml imageDir: 'https://blobcdn.nelson-atkins.org/wpmediablob/Starr' figures.yaml src: figs/5224_c1.png desired outcome https://blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png actual outcome on table of contents https:/blobcdn.nelson-atkins.org/wpmediablob/Starr/https:/blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png which resolves to {URL of current page}https:/blobcdn.nelson-atkins.org/wpmediablob/Starr/https:/blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png Build: ![toc-image-with-absurl-src](https://github.com/thegetty/quire/assets/103956133/2501cf92-82d4-4595-99cb-323c2cd4fc78) Is there a correct way to use images from a blob/cdn or should we endeavor to add the HTTP check to all the places where image src are generated or add a package to join an absolute URL with a path without turning them into relative URLs?
Erin-Cecele commented 9 months ago

Thank you @rrolonNelson, for starting this discussion #861. I have now moved it to an official issue, which we will review with our software engineers.

Thank you also to @cbutcosk for your contribution to helping us understand this issue better:

+1, that section of image-tag.js also causes issues if you're pointing to an internal IIIF resource like a staticInlineImage, which also won't be in the asset-images directory and so joining the path produces image 404s

https://github.com/thegetty/quire/discussions/861#discussioncomment-7253418

mphstudios commented 9 months ago

As a straw man to capture requirements I wrote the following example test cases

const assert = require('node:assert')
const path = require('node:path')
const test = require('node:test')

// "mock" Quire configuration
const config = {
  baseURL: 'https://nelson-atkins.org',
  cdnURL: 'https://blobcdn.nelson-atkins.org/wpmediablob/',
  imageDir: '/assets/images/'
}

/**
 * Resolve the full path for an image
 * Any valid src URI, including using a 'file:' scheme, is passed through unchanged.
 * Relative src paths are qualified with a base URL: either the baseURL or cdnURL.
 *
 * Nota bene: this code is a "strawman" for handling image path use cases,
 * the pattern for integrating this into the current Quire code is TBD...
 */
function resolveImageSrc (src) {
  const cdn = src.startsWith('cdn:')
  src = cdn ? src.slice(4) : src
  let url;
  try {
    url = new URL(src)
  } catch (TypeError) {
    const base = cdn 
      ? config.cdnURL
      : path.join(config.baseURL, config.imageDir);
    url = new URL(src, base)
  } finally {
    return url
  }
}

test('fully qualified URI', () => {
  const src = resolveImageSrc('https://blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png')
  assert.equal(src.href, 'https://blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png')  
})

test('path using file: scheme', () => {
  const src = resolveImageSrc('file:Starr/figs/5224_c1.png')
  assert.equal(src.href, 'file:///Starr/figs/5224_c1.png')  
})

test('relative path', () => {
  const src = resolveImageSrc('Starr/figs/5224_c1.png')
  assert.equal(src.href, 'https://nelson-atkins.org/assets/images/Starr/figs/5224_c1.png')  
})

test('path using a pseudo-scheme', () => {
  const src = resolveImageSrc('cdn:Starr/figs/5224_c1.png')
  assert.equal(src.href, 'https://blobcdn.nelson-atkins.org/wpmediablob/Starr/figs/5224_c1.png')  
})
cbutcosk commented 9 months ago

@mphstudios Thanks for a peak into a possible solution. That said from my perspective this seems too much complexity for an img tag.

Amending the existing but buggy check in the image tag to only prepend the image asset path for relative file paths would have solved this issue for me when I encountered it (which coincidentally also involved a possibility of emitting actual file:// URLs in img tag src attributes, so please don't overload that URL protocol!).

rrolonNelson commented 9 months ago

I am working on this today. I added the check from image-tag.js to table-of-contents/item/image.js and am now able to have the images visible in the table of contents images. I will get a PR up later today for review. However, I am unsure if there are other areas where the img src is being built like this that might cause issues in other places.

rrolonNelson commented 9 months ago

I made a pull request and am open to feedback. I tried to also get the alt text to be added in but it doesn't seem to want to work https://github.com/thegetty/quire/pull/874

mphstudios commented 8 months ago

@rrolonNelson #874 looks good, thank you for submitting the fix; I opened #878 to set the figure image alt text.

Erin-Cecele commented 8 months ago

Citing @mphstudios's comment in PR #874

@rrolonNelson reviewing this solution locally we discovered that the change will break images for the epub output; until we can revisit the solution, if you do not need epub output, the change in this pull-request can be made directly to a local Quire publication.

This issue has not been resolved for all formats and will remain on backlog for the time being.