Missing ".html" suffixes for hosting on AWS S3

deeplook commented 8 months ago

I'm experimenting with hosting an exported Obsidian vault inside an AWS S3 bucket, but I found that hyperlinks are broken, because Quartz generates href attributes with the name of a page, but the files have an additional .html extension, so the links appear broken when clicked.

I thought there might be an option to add .html prefixes to hyperlinks, but the build command doesn't provide that, yet.

A tedious workaround would be to postprocess the HTML files and add .html suffixes to all hyperlinks, but I think this cannot be expected from most people to be capable of. And then I think this issue should come up for other hosting providers, too.

Maybe there is a chance to add such a flag to the build command?

aarnphm commented 8 months ago

Quartz cleans urls.

And we have documentation on this for most hosting providers.

Fwiw I haven't tried hosting on blob storage yet. But most other providers works afaik.

aarnphm commented 8 months ago

I'm going to close this since there is already a discussion on discord.

deeplook commented 8 months ago

It might be helpful to add an entry about AWS S3 on https://quartz.jzhao.xyz/hosting with reasons for not recommending it.

riccardo94p commented 7 months ago

I'm experimenting with hosting an exported Obsidian vault inside an AWS S3 bucket, but I found that hyperlinks are broken, because Quartz generates href attributes with the name of a page, but the files have an additional .html extension, so the links appear broken when clicked.

I thought there might be an option to add .html prefixes to hyperlinks, but the build command doesn't provide that, yet.

A tedious workaround would be to postprocess the HTML files and add .html suffixes to all hyperlinks, but I think this cannot be expected from most people to be capable of. And then I think this issue should come up for other hosting providers, too.

Maybe there is a chance to add such a flag to the build command?

Hello, I experienced the same problem and solved by adding the .html suffix to all relevant href attributes as part of the deployment process. You can try the following code:

for file in ./public/*.html; do
        sed -i 's/<a href="\(\.\/[^"]*\)"\([^>]*\)>/<a href="\1.html"\2>/g' "$file"
done

In my case, I,m using GitHub Actions to build and deploy to S3. The above command is run after the npx quartz build. Hope it helps!

a1anw2 commented 6 months ago

This is exactly the issue I am facing. The build script creates the .html files, but clearly the local webserver is translating those links to .html automagically.

It would be VERY useful to add this to the wiki. Hosting on S3/CloudFront is not unreasonable, and outside of this broken link, works very well (and for next to nothing).

Thank you @riccardo94p for the sed script.

rambip commented 5 months ago

I'm experimenting with hosting an exported Obsidian vault inside an AWS S3 bucket, but I found that hyperlinks are broken, because Quartz generates href attributes with the name of a page, but the files have an additional .html extension, so the links appear broken when clicked. I thought there might be an option to add .html prefixes to hyperlinks, but the build command doesn't provide that, yet. A tedious workaround would be to postprocess the HTML files and add .html suffixes to all hyperlinks, but I think this cannot be expected from most people to be capable of. And then I think this issue should come up for other hosting providers, too. Maybe there is a chance to add such a flag to the build command?

Hello, I experienced the same problem and solved by adding the .html suffix to all relevant href attributes as part of the deployment process. You can try the following code:
for file in ./public/*.html; do
        sed -i 's/<a href="$\.\/[^"]*$"$[^>]*$>/<a href="\1.html"\2>/g' "$file"
done
In my case, I,m using GitHub Actions to build and deploy to S3. The above command is run after the npx quartz build. Hope it helps!

There is just a bug: you must replace href="./" and href="." by /index.html otherwise it will not work.

So the script looks like:

for file in ./public/*.html; do
        sed -i 's/<a href="\/"\([^>]*\)>/<a href="\/index.html"\1>/g' "$file"
        sed -i 's/<a href="\."\([^>]*\)>/<a href="\.\/index.html"\1>/g' "$file"
        sed -i 's/<a href="\(\.\/[^"]*\)"\([^>]*\)>/<a href="\1.html"\2>/g' "$file"
done

rambip commented 5 months ago

~~I think it may be pretty easy to do that with a html or markdown plugin.~~

I found a very easy way to do it !

You just have to add the .html part when quartz resolves the links. Currently the function is fairly simple: https://github.com/jackyzha0/quartz/blob/d03fdc235a7926eed5ad127ffb9c4a5f9c1008b7/quartz/util/path.ts#L83

What I propose is replacing this function by

export function simplifySlug(fp: FullSlug): SimpleSlug {
  const res = stripSlashes(trimSuffix(fp, "index"), true)
  .replace(/^(([^#\/]\/)*[^#\.\/]+)(#[^\/]*)?$/, '$1.html$3')
  return (res.length === 0 ? "/" : res) as SimpleSlug
}

I'm not sure it is the best way to do it, but at least it works and backlinks are not broken.

elliottw commented 3 months ago

~I think it may be pretty easy to do that with a html or markdown plugin.~

I found a very easy way to do it !

You just have to add the .html part when quartz resolves the links. Currently the function is fairly simple:

https://github.com/jackyzha0/quartz/blob/d03fdc235a7926eed5ad127ffb9c4a5f9c1008b7/quartz/util/path.ts#L83

What I propose is replacing this function by
export function simplifySlug(fp: FullSlug): SimpleSlug {
  const res = stripSlashes(trimSuffix(fp, "index"), true)
  .replace(/^(([^#\/]\/)*[^#\.\/]+)(#[^\/]*)?$/, '$1.html$3')
  return (res.length === 0 ? "/" : res) as SimpleSlug
}
I'm not sure it is the best way to do it, but at least it works and backlinks are not broken.

This works for me for bare links to pages likehttp://localhost:8080/link.html but doesn't work when nested like http://localhost:8080/posts/link.html. In the second case, posts/ is removed.

mcordell commented 2 months ago

For anyone who runs across this later, I was able to get it working with CloudFront and S3. First you have to solve the case sensitivity problem by modifying quartz like this. Then you need to deploy to s3 and create a corresponding cloud front distribution. Finally, you have to create a CloudFront function, follow the tutorial steps to get it created and assigned to your distribution. Here is the actual function you should use instead of the tutorial boilerplate:

'use-strict';

function getFileExtension(uri) {
  var path = uri.split('?')[0].split('#')[0];

  var segments = path.split('/');
  var fileName = segments[segments.length - 1];

  var lastDotIndex = fileName.lastIndexOf('.');

  if (lastDotIndex < 1) {
    return '';
  }

  return fileName.slice(lastDotIndex + 1).toLowerCase();
}

const knownExt = {
  'canvas': true,
  'css': true,
  'html': true,
  'js': true,
  'json': true,
  'png': true,
  'xml': true,
  'jpeg': true,
  'jpg': true,
  'gif': true
};

function handler(event) {
    var request = event.request;
    var uri = request.uri;
    var ext = getFileExtension(uri);
    if (uri.endsWith('/')) {
        request.uri += 'index.html';
    }
    else if (!knownExt[ext]) {
        request.uri += '.html';
    }
    return request;
}

Note that the above use the cloudfront javascript which is a limited subset, so the function to determine the extension and the known extensions is crude and may not cover all cases. If you have other file types you should add them to the knownExt hash.

jackyzha0 / quartz

Missing ".html" suffixes for hosting on AWS S3 #863