jackyzha0 / quartz

🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites
https://quartz.jzhao.xyz
MIT License
6.97k stars 2.48k forks source link

Configuring Apache #1459

Open plagasul opened 3 weeks ago

plagasul commented 3 weeks ago

I need help configuring Apache to work with Quartz.

Steps followed:

  1. Build quartz locally using npx quartz build
  2. Upload public folder contents via ftp to shared hosting subfolder (www.example.com/quartz/)
  3. No .htaccess file for starters

What happens:

  1. Create a .htaccess file with the contents suggested in this issue.

What happens:

It would be great if the docs could provide some help for configuring apache as they do for configuring nginx, etc.

Thank you.

saberzero1 commented 2 weeks ago

Can you push the built site quartz/public instead of quartz?

plagasul commented 2 weeks ago

@saberzero1 What I pushed, via FTP, is the content of quartz's public folder, but I put it into a subfolder that, for the sake of simplicity, I called "www.example.com/quartz" but could be "www.example.com/meh"

But yes, it is the content of the public folder what I pushed, without the folder itself, just the content.

saberzero1 commented 2 weeks ago

Can you push the public folder to <your URL>/public?

plagasul commented 2 weeks ago

Yes, I will try this, but, what is the rationale behind it ? Does quartz care about the folder it is in ? I know there is a baseurl option.

Thank you

saberzero1 commented 2 weeks ago

Yes, I will try this, but, what is the rationale behind it ? Does quartz care about the folder it is in ? I know there is a baseurl option.

Thank you

Yeah, the NGINX configuration instructions specify the public folder by default. You said you used these instructions, so I responded based on that.

saberzero1 commented 2 weeks ago

Yes, I will try this, but, what is the rationale behind it ? Does quartz care about the folder it is in ? I know there is a baseurl option.

Thank you

Yeah, the NGINX configuration instructions specify the public folder by default. You said you used these instructions, so I responded based on that.

Okay, I need to read. You use Apache. Gimme a second.

saberzero1 commented 2 weeks ago
<VirtualHost *:80>
    ServerName example.com
    DocumentRoot /path/to/quartz/public

    DirectoryIndex index.html

    ErrorDocument 404 /404.html

    <Directory /path/to/quartz/public>
        Options Indexes FollowSymLinks
        AllowOverride None
        Require all granted
    </Directory>

    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ $1.html [L]
</VirtualHost>

Makes sure to enable the required Apache modules as well (mod_rewrite).

plagasul commented 1 week ago

Thank you @saberzero1 for your help.

As I am using the .htaccess file of the shared hosting, I tried to adapt your answer to .htaccess syntax, like this:

# Enable the rewrite engine
RewriteEngine On

# Set the default file to serve for directories
DirectoryIndex index.html index.htm index.php 

# Custom 404 error page
ErrorDocument 404 /404.html

# Directory options
Options FollowSymLinks
# AllowOverride None
Require all granted

# Rewrites
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ $1.html [L]
  1. AllowOverride is commented out because it produced a 500 error.
  2. Indexes in Options is removed because it caused problems loading index.html
  3. Accessing www.domain.net/path/to/quartz/ works as intended now. Thank you.
  4. I noticed another problem that was making it difficult to troubleshoot my original .htaccess file problems: Any file (so, any note) with a title that contains Spanish 'tildes', such as Introducción (note the ó) or Práctica (note the à) returns a server error. The same happens to all files within folders whose title contains a 'tilde'

May this be Quartz related? I haven't encountered this problem before.

Thank you

saberzero1 commented 1 week ago

Thank you @saberzero1 for your help.

As I am using the .htaccess file of the shared hosting, I tried to adapt your answer to .htaccess syntax, like this:

# Enable the rewrite engine
RewriteEngine On

# Set the default file to serve for directories
DirectoryIndex index.html index.htm index.php 

# Custom 404 error page
ErrorDocument 404 /404.html

# Directory options
Options FollowSymLinks
# AllowOverride None
Require all granted

# Rewrites
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ $1.html [L]
  1. AllowOverride is commented out because it produced a 500 error.
  2. Indexes in Options is removed because it caused problems loading index.html
  3. Accessing www.domain.net/path/to/quartz/ works as intended now. Thank you.
  4. I noticed another problem that was making it difficult to troubleshoot my original .htaccess file problems: Any file (so, any note) with a title that contains Spanish 'tildes', such as Introducción (note the ó) or Práctica (note the à) returns a server error. The same happens to all files within folders whose title contains a 'tilde'

May this be Quartz related? I haven't encountered this problem before.

Thank you

It could be a few things. Can you check something for me:

On any note that has a link to a note with accent, does the link contain the special characters? (You can check this with inspect element in Chrome, or by hovering the link)

saberzero1 commented 1 week ago

Also, as you're using .htaccess, does the script in https://github.com/jackyzha0/quartz/issues/1079#issuecomment-2049772686 work?

saberzero1 commented 1 week ago

Also, as you're using .htaccess, does the script in https://github.com/jackyzha0/quartz/issues/1079#issuecomment-2049772686 work?

I should probably update the docs for .htaccess...

plagasul commented 1 week ago

Thanks @saberzero1

  1. On any note that has a link to a note with accent, the link DOES contain the accent, example:
<a href="./Autonomía" class="internal alias" data-slug="Autonomía">Autonomía</a>
  1. The htaccess from #1079 does load the pages with accent, but completely breaking the style, as if no css was loaded. see:

image

  1. There are several errors in the console:
Uncaught SyntaxError: Unexpected token '<' (at prescript.js:1:1)
Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/html". Strict MIME type checking is enforced for module scripts per HTML spec. postscript.js:1
VM410:1 Uncaught (in promise) SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

Let me know what else to do to troubleshoot. If you want I can give you privately the url of this site.

Thank you very much

saberzero1 commented 1 week ago

Thanks @saberzero1

  1. On any note that has a link to a note with accent, the link DOES contain the accent, example:
<a href="./Autonomía" class="internal alias" data-slug="Autonomía">Autonomía</a>
  1. The htaccess from #1079 does load the pages with accent, but completely breaking the style, as if no css was loaded. see:

image

  1. There are several errors in the console:
Uncaught SyntaxError: Unexpected token '<' (at prescript.js:1:1)
Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/html". Strict MIME type checking is enforced for module scripts per HTML spec. postscript.js:1
VM410:1 Uncaught (in promise) SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

Let me know what else to do to troubleshoot. If you want I can give you privately the url of this site.

Thank you very much

Are you hosting from a non-root folder? The index.html is looking for files that are in another folder. Let's say you are hosting from https://example.com/folder/subfolder/quartz/public, you need to include that in your rewrite engine. Also turn it on. Add this to the top of your .htaccess.

RewriteEngine On
RewriteBase / #change this to your path to public folder
plagasul commented 1 week ago

Both .htaccess have RewriteEngine on already at the top.

Yes, I am serving from a subfolder, I assume you mean to add the path to public folder, to the .htaccess from the other issue, the one that seems to break css, like this:

# Enable the rewrite engine
RewriteEngine On
RewriteBase /path/to/public

# Serve 404 page when a file or directory is not found
ErrorDocument 404 /404.html

# Serve the requested resource as .html if it exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html   

# Serve index.html if it exists, otherwise rewrite to the file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html

It does not fix the style issue, though.

I tried several ways to write the Base path, perhaps you can help me.

My domain example.net points to /public_html/_thisfolder

And quartz files are located at /public_html/_thisfolder/dev/this_wiki

Is, then, this line correct, slashes and everything?:

RewriteBase /dev/this_wiki

Please note that .htaccess is placed at /public_html/_thisfolder/dev/this_wiki

Thanks

saberzero1 commented 1 week ago

Both .htaccess have RewriteEngine on already at the top.

Yes, I am serving from a subfolder, I assume you mean to add the path to public folder, to the .htaccess from the other issue, the one that seems to break css, like this:

# Enable the rewrite engine
RewriteEngine On
RewriteBase /path/to/public

# Serve 404 page when a file or directory is not found
ErrorDocument 404 /404.html

# Serve the requested resource as .html if it exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html   

# Serve index.html if it exists, otherwise rewrite to the file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html

It does not fix the style issue, though.

I tried several ways to write the Base path, perhaps you can help me.

My domain example.net points to /public_html/_thisfolder

And quartz files are located at /public_html/_thisfolder/dev/this_wiki

Is, then, this line correct, slashes and everything?:

RewriteBase /dev/this_wiki

Please note that .htaccess is placed at /public_html/_thisfolder/dev/this_wiki

Thanks

What do you mean both .htaccess? do you have multiple?

If your index.html is in the same folder as your .htaccess, I would assume RewriteBase /.

plagasul commented 1 week ago

I mean that both the .htaccess I wrote AND the one found at the issue you linked, both already have RewriteEngine on at the top. So that is covered.

Yes, my index.html is in the same folder as my .htaccess

I tried RewriteBase / and that does not fix css, but breaks redirection on many pages, such as root pages, that get redirected to example.net

I can't even understand which are sent to example.net and which are not, but, yes, accents are still an issue.

Again, if you want I can privately provide the link to the site, thank you.

saberzero1 commented 1 week ago

I have no idea how your VPS is set up and what is currently broken. Your RewriteBase should point to your index.html root file, relative to the site root.

My domain example.net points to /public_html/_thisfolder

And quartz files are located at /public_html/_thisfolder/dev/this_wiki

Is, then, this line correct, slashes and everything?:

RewriteBase /dev/this_wiki

Should be RewriteBase /public_html/_thisfolder/dev/this_wiki/ (perhaps without the trailing slash, not sure)

Besides that, you might need to do some more rewrites to non-html sources. We'll figure that out after we fix this part.

plagasul commented 1 week ago

Ok, thank you for helping with such an annoying config. This is shared hosting by the way, at HostGator.

This htaccess:

# Enable the rewrite engine
RewriteEngine On
RewriteBase /public_html/_thisfolder/dev/this_wiki/

# Serve 404 page when a file or directory is not found
ErrorDocument 404 /404.html

# Serve the requested resource as .html if it exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html   

# Serve index.html if it exists, otherwise rewrite to the file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html

Root quartz folder works. Links to other root notes, work. Links to notes without accents, under folders without accents, work. Links to notes with accents OR under folders with accents, redirect to example.net

plagasul commented 1 week ago

I would say it works better without any RewriteBase, as all pages load, although accented ones break styles. With any and all RewriteBase I tried, many pages redirect to example.net, even clicking on index

saberzero1 commented 1 week ago

Alright. So then we don't use RewriteBase. Fine with me. Don't fix it if it ain't broke.

As for the accented characters, can you test this rewrite rule?

RewriteRule ^([\w\-\%]+)$ $1.html [NE,L]

NE tells Apache to not rewrite special characters. (like `(space) to%20`)

plagasul commented 1 week ago

Anywhere in particular in .htaccess?

saberzero1 commented 1 week ago

Anywhere in particular in .htaccess?

Probably instead of RewriteRule ^(.*)$ $1.html

plagasul commented 1 week ago

Ok, that did not seem to make any difference.

Meanwhile I noticed something that seems relevant.

When I attempt to access either a file with accent in a folder without accent, or any file within a folder with accent, the request for index.css prescript and postscript, contentIndex.json and other similar files does not use quartz root but the folder where that file is.

For example:

  1. example.net/this_wiki/5.-Glosario/Asset correctly requests those files at this_wiki/
  2. example.net/this_wiki/5.-Glosario/Contemplación mistakenly requests those files at this_wiki/5.-Glosario/
  3. example.net/this_wiki/4.-Metodología/Tablas mistakenly requests those files at this_wiki/4.-Metodología/

Why would the accent cause this? This is veery strange. Do you consider this to be solely an htaccess problem, or is there any aspect of how quartz works that may be part of it?

Thanks

saberzero1 commented 1 week ago

Alright, I have no idea if this is going to work, but I have asked ~the AI hivemind~ GPT-4o.

RewriteEngine On

# Serve the requested resource as .html if it exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^([\w\-\%]+)$ /dev/this_wiki/$1.html [NE,L]

# Serve index.html if it exists, otherwise rewrite to the file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ /dev/this_wiki/index.html [L]

DirectoryIndex index.html
ErrorDocument 404 /404.html

Options +FollowSymLinks
plagasul commented 1 week ago

I am afraid the hivemind is hallucinating, breaks very much. I also tried changing a bit the order, does not help.

But, bear with me for a second. If I open the .html files produced by quartz, the path to index.css (and similar) is href="./index.css" for root notes, and href="../index.css" for notes in subfolders, as it should be.

But when I visit the same notes in subfolders in the browser and inspect <head>, if their title or their parent folder title have accents, the same link shows href="./index.css"

How the bleep do paths of links in <head> change on load?

Thank you

saberzero1 commented 1 week ago

I am afraid the hivemind is hallucinating, breaks very much. I also tried changing a bit the order, does not help.

But, bear with me for a second. If I open the .html files produced by quartz, the path to index.css (and similar) is href="./index.css" for root notes, and href="../index.css" for notes in subfolders, as it should be.

But when I visit the same notes in subfolders in the browser and inspect <head>, if their title or their parent folder title have accents, the same link shows href="./index.css"

How the bleep do paths of links in <head> change on load?

Thank you

Oh that's simple: .htaccess rewrites those paths.

On another note, can you try this? w3 recommends it

AddCharset UTF-8 .html
plagasul commented 1 week ago

Does not do anything, I also tried AddDefaultCharset UTF-8

I need to go sleep now, but will continue following your advice tomorrow. Thank you very much.

plagasul commented 1 week ago

Hello again,

Discussing with the hivemind, I've been trying this .htaccess

# Enable the rewrite engine
RewriteEngine On

# Disable MultiViews to prevent Apache from guessing filenames
Options -MultiViews

# Set the default file to serve for directories
DirectoryIndex index.html index.htm index.php

# Serve 404 page when a file or directory is not found
ErrorDocument 404 /404.html

# Ensure proper charset handling (UTF-8)
AddDefaultCharset UTF-8

# Ensure the request isn't already encoded (prevents double encoding)
RewriteCond %{THE_REQUEST} !% [NC]

# Serve the requested resource as .html if such .html exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html [L]

# Serve index.html if it exists, otherwise rewrite to the file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html [L]

Which did not change behaviour. But I also tried something ChatHivePT suggested, which was to manually replace "ó" with "%C3%B3" on a title of a folder AND a note, and see what happened.

Nothing happened (same behaviour) but I noticed that for the file Introducci%C3%B3n.html devtools window titlebar showed `Introducci%CC%81n.html which I believe are two different encoding systems. (hive says first is precomposed Unicode Normalization Form C- NFC and second is decomposed Unicode Normalization Form D - NFD)

Relevant?

Thank you

saberzero1 commented 1 week ago

Morning,

# Enable the rewrite engine
RewriteEngine On
Options -MultiViews

# Default directory file
DirectoryIndex index.html index.htm index.php

# Custom 404 Error Page
ErrorDocument 404 /404.html

# Default charset
AddDefaultCharset UTF-8

# Avoid double encoding
RewriteCond %{THE_REQUEST} !% [NC]

# Serve .html if present
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html [L]

# Serve index.html for unknown resources
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html [L]

It might be partly related to you using shared hosting. Some options might be ignored to protect the other users sharing your physical machine.

Can you try the Options -MultiViews?

https://stackoverflow.com/questions/25423141/what-exactly-does-the-multiviews-options-in-htaccess

saberzero1 commented 1 week ago

And if this doesn't work, I have another workaround. I feel like we should work on a permanent fix, but we could change a small bit of code to make the filenames and links convert to non-accented characters, while keeping the visible titles and link texts as is.

plagasul commented 1 week ago

Hello, thank you, if you re-check my last .htaccess you'll see Options -MultiViews is already there. Tell me what you want me to try, please. Thank you.

saberzero1 commented 1 week ago

Hello, thank you, if you re-check my last .htaccess you'll see Options -MultiViews is already there. Tell me what you want me to try, please. Thank you.

The part under avoid double encoding.

plagasul commented 1 week ago

isn't the last code you sent exactly the same than the last code I sent (minus comments)? you 4 hours ago,me 5 hours ago.

saberzero1 commented 1 week ago

ism't the last code you sent exactly the same than the last code I sent? you 4 hours ago,me 5 hours ago

# Enable the rewrite engine
RewriteEngine On
Options -MultiViews

# Default directory file
DirectoryIndex index.html index.htm index.php

# Custom 404 Error Page
ErrorDocument 404 /404.html

# Default charset
AddDefaultCharset UTF-8

# Avoid double encoding
RewriteCond %{THE_REQUEST} !% [NC]

# Rewrite all special characters
RewriteRule ^(.*)(A|Á|á|Â|â|Æ|æ|À|à|Å|å|Ã|ã|Ä|ä)(.*)$ $1a$3
RewriteRule ^(.*)(B)(.*)$ $1b$3
RewriteRule ^(.*)(C|Ç|ç)(.*)$ $1c$3
RewriteRule ^(.*)(C|Ç|ç)(.*)$ $1c$3
RewriteRule ^(.*)(D)(.*)$ $1d$3
RewriteRule ^(.*)(E|É|é|Ê|ê|È|è|Ð|ð|Ë|ë)(.*)$ $1e$3
RewriteRule ^(.*)(F)(.*)$ $1f$3
RewriteRule ^(.*)(G)(.*)$ $1g$3
RewriteRule ^(.*)(H)(.*)$ $1h$3
RewriteRule ^(.*)(I|Í|í|Î|î|Ì|ì|Ï|ï)(.*)$ $1i$3
RewriteRule ^(.*)(J)(.*)$ $1j$3
RewriteRule ^(.*)(K)(.*)$ $1k$3
RewriteRule ^(.*)(L)(.*)$ $1l$3
RewriteRule ^(.*)(M)(.*)$ $1m$3
RewriteRule ^(.*)(N|Ñ|ñ)(.*)$ $1n$3
RewriteRule ^(.*)(O|Ó|ó|Ô|ô|Œ|œ|Ò|ò|Ø|ø|Õ|õ|Ö|ö)(.*)$ $1o$3
RewriteRule ^(.*)(P|ß|Þ|þ)(.*)$ $1p$3
RewriteRule ^(.*)(Q)(.*)$ $1q$3
RewriteRule ^(.*)(R)(.*)$ $1r$3
RewriteRule ^(.*)(S)(.*)$ $1s$3
RewriteRule ^(.*)(T)(.*)$ $1t$3
RewriteRule ^(.*)(U|Ú|ú|Û|û|Ù|ù|Ü|ü)(.*)$ $1u$3
RewriteRule ^(.*)(V)(.*)$ $1v$3
RewriteRule ^(.*)(W)(.*)$ $1w$3
RewriteRule ^(.*)(X)(.*)$ $1x$3
RewriteRule ^(.*)(Y)(.*)$ $1y$3
RewriteRule ^(.*)(Z)(.*)$ $1z$3
RewriteRule ^ -

# Serve .html if present
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html [L]

# Serve index.html for unknown resources
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html [L]
plagasul commented 1 week ago

Although beautiful.That did not work.

A friend found a solution w the help of gpto. Not in htaccess but in Quartz's path.ts , adding a normalize("NFC") function in several lines. We cannot evaluate it in the context of Quartz code, though. But I assume this will not be adequate for all scenarios as it forces nfc. Please take a look:

import { slug as slugAnchor } from "github-slugger"
import type { Element as HastElement } from "hast"
import rfdc from "rfdc"

export const clone = rfdc()

// this file must be isomorphic so it can't use node libs (e.g. path)

export const QUARTZ = "quartz"

/// Utility type to simulate nominal types in TypeScript
type SlugLike<T> = string & { __brand: T }

/** Cannot be relative and must have a file extension. */
export type FilePath = SlugLike<"filepath">
export function isFilePath(s: string): s is FilePath {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form

  const validStart = !s.startsWith(".");
  return validStart && _hasFileExtension(s);
}

/** Cannot be relative and may not have leading or trailing slashes. It can have `index` as it's last segment. Use this wherever possible is it's the most 'general' interpretation of a slug. */
export type FullSlug = SlugLike<"full">
export function isFullSlug(s: string): s is FullSlug {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form

  const validStart = !(s.startsWith(".") || s.startsWith("/"));
  const validEnding = !s.endsWith("/");
  return validStart && validEnding && !containsForbiddenCharacters(s);
}

/** Shouldn't be a relative path and shouldn't have `/index` as an ending or a file extension. It _can_ however have a trailing slash to indicate a folder path. */
export type SimpleSlug = SlugLike<"simple">
export function isSimpleSlug(s: string): s is SimpleSlug {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form

  const validStart = !(s.startsWith(".") || (s.length > 1 && s.startsWith("/")))
  const validEnding = !endsWith(s, "index")
  return validStart && !containsForbiddenCharacters(s) && validEnding && !_hasFileExtension(s)
}

/** Can be found on `href`s but can also be constructed for client-side navigation (e.g. search and graph) */
export type RelativeURL = SlugLike<"relative">
export function isRelativeURL(s: string): s is RelativeURL {
  const validStart = /^\.{1,2}/.test(s)
  const validEnding = !endsWith(s, "index")
  return validStart && validEnding && ![".md", ".html"].includes(_getFileExtension(s) ?? "")
}

export function getFullSlug(window: Window): FullSlug {
  // Get the slug from the dataset and normalize it
  const slug = window.document.body.dataset.slug! as string;

  // Normalize the slug to NFC form (precomposed Unicode)
  const normalizedSlug = slug.normalize("NFC");

  // Return the normalized slug as FullSlug type
  return normalizedSlug as FullSlug;
}

function sluggify(s: string): string {
  return s
    .split("/")
    .map((segment) =>
      segment
        .replace(/\s/g, "-")
        .replace(/&/g, "-and-")
        .replace(/%/g, "-percent")
        .replace(/\?/g, "")
        .replace(/#/g, ""),
    )
    .join("/") // always use / as sep
    .replace(/\/$/, "")
}

export function slugifyFilePath(fp: FilePath, excludeExt?: boolean): FullSlug {
  fp = stripSlashes(fp) as FilePath
  let ext = _getFileExtension(fp)
  const withoutFileExt = fp.replace(new RegExp(ext + "$"), "")
  if (excludeExt || [".md", ".html", undefined].includes(ext)) {
    ext = ""
  }

  let slug = sluggify(withoutFileExt)

  // treat _index as index
  if (endsWith(slug, "_index")) {
    slug = slug.replace(/_index$/, "index")
  }

  return (slug + ext) as FullSlug
}

export function simplifySlug(fp: FullSlug): SimpleSlug {
  const res = stripSlashes(trimSuffix(fp, "index"), true)
  return (res.length === 0 ? "/" : res) as SimpleSlug
}

export function transformInternalLink(link: string): RelativeURL {
  let [fplike, anchor] = splitAnchor(decodeURI(link))

  const folderPath = isFolderPath(fplike)
  let segments = fplike.split("/").filter((x) => x.length > 0)
  let prefix = segments.filter(isRelativeSegment).join("/")
  let fp = segments.filter((seg) => !isRelativeSegment(seg) && seg !== "").join("/")

  // manually add ext here as we want to not strip 'index' if it has an extension
  const simpleSlug = simplifySlug(slugifyFilePath(fp as FilePath))
  const joined = joinSegments(stripSlashes(prefix), stripSlashes(simpleSlug))
  const trail = folderPath ? "/" : ""
  const res = (_addRelativeToStart(joined) + trail + anchor) as RelativeURL
  return res
}

// from micromorph/src/utils.ts
// https://github.com/natemoo-re/micromorph/blob/main/src/utils.ts#L5
const _rebaseHtmlElement = (el: Element, attr: string, newBase: string | URL) => {
  const rebased = new URL(el.getAttribute(attr)!, newBase)
  el.setAttribute(attr, rebased.pathname + rebased.hash)
}
export function normalizeRelativeURLs(el: Element | Document, destination: string | URL) {
  el.querySelectorAll('[href^="./"], [href^="../"]').forEach((item) =>
    _rebaseHtmlElement(item, "href", destination),
  )
  el.querySelectorAll('[src^="./"], [src^="../"]').forEach((item) =>
    _rebaseHtmlElement(item, "src", destination),
  )
}

const _rebaseHastElement = (
  el: HastElement,
  attr: string,
  curBase: FullSlug,
  newBase: FullSlug,
) => {
  if (el.properties?.[attr]) {
    if (!isRelativeURL(String(el.properties[attr]))) {
      return
    }

    const rel = joinSegments(resolveRelative(curBase, newBase), "..", el.properties[attr] as string)
    el.properties[attr] = rel
  }
}

export function normalizeHastElement(rawEl: HastElement, curBase: FullSlug, newBase: FullSlug) {
  const el = clone(rawEl) // clone so we dont modify the original page
  _rebaseHastElement(el, "src", curBase, newBase)
  _rebaseHastElement(el, "href", curBase, newBase)
  if (el.children) {
    el.children = el.children.map((child) =>
      normalizeHastElement(child as HastElement, curBase, newBase),
    )
  }

  return el
}

// resolve /a/b/c to ../..
export function pathToRoot(slug: FullSlug): RelativeURL {
  let rootPath = slug
    .split("/")
    .filter((x) => x !== "")
    .slice(0, -1)
    .map((_) => "..")
    .join("/")

  if (rootPath.length === 0) {
    rootPath = "."
  }

  return rootPath as RelativeURL
}

export function resolveRelative(current: FullSlug, target: FullSlug | SimpleSlug): RelativeURL {
  const res = joinSegments(pathToRoot(current), simplifySlug(target as FullSlug)) as RelativeURL
  return res
}

export function splitAnchor(link: string): [string, string] {
  let [fp, anchor] = link.split("#", 2)
  if (fp.endsWith(".pdf")) {
    return [fp, anchor === undefined ? "" : `#${anchor}`]
  }
  anchor = anchor === undefined ? "" : "#" + slugAnchor(anchor)
  return [fp, anchor]
}

export function slugTag(tag: string) {
  return tag
    .split("/")
    .map((tagSegment) => sluggify(tagSegment))
    .join("/")
}

export function joinSegments(...args: string[]): string {
  return args
    .filter((segment) => segment !== "")
    .join("/")
    .replace(/\/\/+/g, "/")
}

export function getAllSegmentPrefixes(tags: string): string[] {
  const segments = tags.split("/")
  const results: string[] = []
  for (let i = 0; i < segments.length; i++) {
    results.push(segments.slice(0, i + 1).join("/"))
  }
  return results
}

export interface TransformOptions {
  strategy: "absolute" | "relative" | "shortest"
  allSlugs: FullSlug[]
}

export function transformLink(src: FullSlug, target: string, opts: TransformOptions): RelativeURL {
  let targetSlug = transformInternalLink(target)

  if (opts.strategy === "relative") {
    return targetSlug as RelativeURL
  } else {
    const folderTail = isFolderPath(targetSlug) ? "/" : ""
    const canonicalSlug = stripSlashes(targetSlug.slice(".".length))
    let [targetCanonical, targetAnchor] = splitAnchor(canonicalSlug)

    if (opts.strategy === "shortest") {
      // if the file name is unique, then it's just the filename
      const matchingFileNames = opts.allSlugs.filter((slug) => {
        const parts = slug.split("/")
        const fileName = parts.at(-1)
        return targetCanonical === fileName
      })

      // only match, just use it
      if (matchingFileNames.length === 1) {
        const targetSlug = matchingFileNames[0]
        return (resolveRelative(src, targetSlug) + targetAnchor) as RelativeURL
      }
    }

    // if it's not unique, then it's the absolute path from the vault root
    return (joinSegments(pathToRoot(src), canonicalSlug) + folderTail) as RelativeURL
  }
}

// path helpers
function isFolderPath(fplike: string): boolean {
  return (
    fplike.endsWith("/") ||
    endsWith(fplike, "index") ||
    endsWith(fplike, "index.md") ||
    endsWith(fplike, "index.html")
  )
}

export function endsWith(s: string, suffix: string): boolean {
  return s === suffix || s.endsWith("/" + suffix)
}

function trimSuffix(s: string, suffix: string): string {
  if (endsWith(s, suffix)) {
    s = s.slice(0, -suffix.length)
  }
  return s
}

function containsForbiddenCharacters(s: string): boolean {
  return s.includes(" ") || s.includes("#") || s.includes("?") || s.includes("&")
}

function _hasFileExtension(s: string): boolean {
  return _getFileExtension(s) !== undefined
}

function _getFileExtension(s: string): string | undefined {
  return s.match(/\.[A-Za-z0-9]+$/)?.[0]
}

function isRelativeSegment(s: string): boolean {
  return /^\.{0,2}$/.test(s)
}

export function stripSlashes(s: string, onlyStripPrefix?: boolean): string {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form (precomposed)

  if (s.startsWith("/")) {
    s = s.substring(1);
  }

  if (!onlyStripPrefix && s.endsWith("/")) {
    s = s.slice(0, -1);
  }

  return s;
}

function _addRelativeToStart(s: string): string {
  if (s === "") {
    s = "."
  }

  if (!s.startsWith(".")) {
    s = joinSegments(".", s)
  }

  return s
}
saberzero1 commented 1 week ago

Although beautiful.That did not work.

A friend found a solution w the help of gpto. Not in htaccess but in Quartz's path.ts , adding a normalize("NFC") function in several lines. We cannot evaluate it in the context of Quartz code, though. But I assume this will not be adequate for all scenarios as it forces nfc. Please take a look:

import { slug as slugAnchor } from "github-slugger"
import type { Element as HastElement } from "hast"
import rfdc from "rfdc"

export const clone = rfdc()

// this file must be isomorphic so it can't use node libs (e.g. path)

export const QUARTZ = "quartz"

/// Utility type to simulate nominal types in TypeScript
type SlugLike<T> = string & { __brand: T }

/** Cannot be relative and must have a file extension. */
export type FilePath = SlugLike<"filepath">
export function isFilePath(s: string): s is FilePath {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form

  const validStart = !s.startsWith(".");
  return validStart && _hasFileExtension(s);
}

/** Cannot be relative and may not have leading or trailing slashes. It can have `index` as it's last segment. Use this wherever possible is it's the most 'general' interpretation of a slug. */
export type FullSlug = SlugLike<"full">
export function isFullSlug(s: string): s is FullSlug {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form

  const validStart = !(s.startsWith(".") || s.startsWith("/"));
  const validEnding = !s.endsWith("/");
  return validStart && validEnding && !containsForbiddenCharacters(s);
}

/** Shouldn't be a relative path and shouldn't have `/index` as an ending or a file extension. It _can_ however have a trailing slash to indicate a folder path. */
export type SimpleSlug = SlugLike<"simple">
export function isSimpleSlug(s: string): s is SimpleSlug {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form

  const validStart = !(s.startsWith(".") || (s.length > 1 && s.startsWith("/")))
  const validEnding = !endsWith(s, "index")
  return validStart && !containsForbiddenCharacters(s) && validEnding && !_hasFileExtension(s)
}

/** Can be found on `href`s but can also be constructed for client-side navigation (e.g. search and graph) */
export type RelativeURL = SlugLike<"relative">
export function isRelativeURL(s: string): s is RelativeURL {
  const validStart = /^\.{1,2}/.test(s)
  const validEnding = !endsWith(s, "index")
  return validStart && validEnding && ![".md", ".html"].includes(_getFileExtension(s) ?? "")
}

export function getFullSlug(window: Window): FullSlug {
  // Get the slug from the dataset and normalize it
  const slug = window.document.body.dataset.slug! as string;

  // Normalize the slug to NFC form (precomposed Unicode)
  const normalizedSlug = slug.normalize("NFC");

  // Return the normalized slug as FullSlug type
  return normalizedSlug as FullSlug;
}

function sluggify(s: string): string {
  return s
    .split("/")
    .map((segment) =>
      segment
        .replace(/\s/g, "-")
        .replace(/&/g, "-and-")
        .replace(/%/g, "-percent")
        .replace(/\?/g, "")
        .replace(/#/g, ""),
    )
    .join("/") // always use / as sep
    .replace(/\/$/, "")
}

export function slugifyFilePath(fp: FilePath, excludeExt?: boolean): FullSlug {
  fp = stripSlashes(fp) as FilePath
  let ext = _getFileExtension(fp)
  const withoutFileExt = fp.replace(new RegExp(ext + "$"), "")
  if (excludeExt || [".md", ".html", undefined].includes(ext)) {
    ext = ""
  }

  let slug = sluggify(withoutFileExt)

  // treat _index as index
  if (endsWith(slug, "_index")) {
    slug = slug.replace(/_index$/, "index")
  }

  return (slug + ext) as FullSlug
}

export function simplifySlug(fp: FullSlug): SimpleSlug {
  const res = stripSlashes(trimSuffix(fp, "index"), true)
  return (res.length === 0 ? "/" : res) as SimpleSlug
}

export function transformInternalLink(link: string): RelativeURL {
  let [fplike, anchor] = splitAnchor(decodeURI(link))

  const folderPath = isFolderPath(fplike)
  let segments = fplike.split("/").filter((x) => x.length > 0)
  let prefix = segments.filter(isRelativeSegment).join("/")
  let fp = segments.filter((seg) => !isRelativeSegment(seg) && seg !== "").join("/")

  // manually add ext here as we want to not strip 'index' if it has an extension
  const simpleSlug = simplifySlug(slugifyFilePath(fp as FilePath))
  const joined = joinSegments(stripSlashes(prefix), stripSlashes(simpleSlug))
  const trail = folderPath ? "/" : ""
  const res = (_addRelativeToStart(joined) + trail + anchor) as RelativeURL
  return res
}

// from micromorph/src/utils.ts
// https://github.com/natemoo-re/micromorph/blob/main/src/utils.ts#L5
const _rebaseHtmlElement = (el: Element, attr: string, newBase: string | URL) => {
  const rebased = new URL(el.getAttribute(attr)!, newBase)
  el.setAttribute(attr, rebased.pathname + rebased.hash)
}
export function normalizeRelativeURLs(el: Element | Document, destination: string | URL) {
  el.querySelectorAll('[href^="./"], [href^="../"]').forEach((item) =>
    _rebaseHtmlElement(item, "href", destination),
  )
  el.querySelectorAll('[src^="./"], [src^="../"]').forEach((item) =>
    _rebaseHtmlElement(item, "src", destination),
  )
}

const _rebaseHastElement = (
  el: HastElement,
  attr: string,
  curBase: FullSlug,
  newBase: FullSlug,
) => {
  if (el.properties?.[attr]) {
    if (!isRelativeURL(String(el.properties[attr]))) {
      return
    }

    const rel = joinSegments(resolveRelative(curBase, newBase), "..", el.properties[attr] as string)
    el.properties[attr] = rel
  }
}

export function normalizeHastElement(rawEl: HastElement, curBase: FullSlug, newBase: FullSlug) {
  const el = clone(rawEl) // clone so we dont modify the original page
  _rebaseHastElement(el, "src", curBase, newBase)
  _rebaseHastElement(el, "href", curBase, newBase)
  if (el.children) {
    el.children = el.children.map((child) =>
      normalizeHastElement(child as HastElement, curBase, newBase),
    )
  }

  return el
}

// resolve /a/b/c to ../..
export function pathToRoot(slug: FullSlug): RelativeURL {
  let rootPath = slug
    .split("/")
    .filter((x) => x !== "")
    .slice(0, -1)
    .map((_) => "..")
    .join("/")

  if (rootPath.length === 0) {
    rootPath = "."
  }

  return rootPath as RelativeURL
}

export function resolveRelative(current: FullSlug, target: FullSlug | SimpleSlug): RelativeURL {
  const res = joinSegments(pathToRoot(current), simplifySlug(target as FullSlug)) as RelativeURL
  return res
}

export function splitAnchor(link: string): [string, string] {
  let [fp, anchor] = link.split("#", 2)
  if (fp.endsWith(".pdf")) {
    return [fp, anchor === undefined ? "" : `#${anchor}`]
  }
  anchor = anchor === undefined ? "" : "#" + slugAnchor(anchor)
  return [fp, anchor]
}

export function slugTag(tag: string) {
  return tag
    .split("/")
    .map((tagSegment) => sluggify(tagSegment))
    .join("/")
}

export function joinSegments(...args: string[]): string {
  return args
    .filter((segment) => segment !== "")
    .join("/")
    .replace(/\/\/+/g, "/")
}

export function getAllSegmentPrefixes(tags: string): string[] {
  const segments = tags.split("/")
  const results: string[] = []
  for (let i = 0; i < segments.length; i++) {
    results.push(segments.slice(0, i + 1).join("/"))
  }
  return results
}

export interface TransformOptions {
  strategy: "absolute" | "relative" | "shortest"
  allSlugs: FullSlug[]
}

export function transformLink(src: FullSlug, target: string, opts: TransformOptions): RelativeURL {
  let targetSlug = transformInternalLink(target)

  if (opts.strategy === "relative") {
    return targetSlug as RelativeURL
  } else {
    const folderTail = isFolderPath(targetSlug) ? "/" : ""
    const canonicalSlug = stripSlashes(targetSlug.slice(".".length))
    let [targetCanonical, targetAnchor] = splitAnchor(canonicalSlug)

    if (opts.strategy === "shortest") {
      // if the file name is unique, then it's just the filename
      const matchingFileNames = opts.allSlugs.filter((slug) => {
        const parts = slug.split("/")
        const fileName = parts.at(-1)
        return targetCanonical === fileName
      })

      // only match, just use it
      if (matchingFileNames.length === 1) {
        const targetSlug = matchingFileNames[0]
        return (resolveRelative(src, targetSlug) + targetAnchor) as RelativeURL
      }
    }

    // if it's not unique, then it's the absolute path from the vault root
    return (joinSegments(pathToRoot(src), canonicalSlug) + folderTail) as RelativeURL
  }
}

// path helpers
function isFolderPath(fplike: string): boolean {
  return (
    fplike.endsWith("/") ||
    endsWith(fplike, "index") ||
    endsWith(fplike, "index.md") ||
    endsWith(fplike, "index.html")
  )
}

export function endsWith(s: string, suffix: string): boolean {
  return s === suffix || s.endsWith("/" + suffix)
}

function trimSuffix(s: string, suffix: string): string {
  if (endsWith(s, suffix)) {
    s = s.slice(0, -suffix.length)
  }
  return s
}

function containsForbiddenCharacters(s: string): boolean {
  return s.includes(" ") || s.includes("#") || s.includes("?") || s.includes("&")
}

function _hasFileExtension(s: string): boolean {
  return _getFileExtension(s) !== undefined
}

function _getFileExtension(s: string): string | undefined {
  return s.match(/\.[A-Za-z0-9]+$/)?.[0]
}

function isRelativeSegment(s: string): boolean {
  return /^\.{0,2}$/.test(s)
}

export function stripSlashes(s: string, onlyStripPrefix?: boolean): string {
  s = s.normalize("NFC");  // Normalize Unicode to NFC form (precomposed)

  if (s.startsWith("/")) {
    s = s.substring(1);
  }

  if (!onlyStripPrefix && s.endsWith("/")) {
    s = s.slice(0, -1);
  }

  return s;
}

function _addRelativeToStart(s: string): string {
  if (s === "") {
    s = "."
  }

  if (!s.startsWith(".")) {
    s = joinSegments(".", s)
  }

  return s
}

Yeah, basically what we tried to do a few days ago through .htaccess

Largest caveat would be that it forces unicode instead of UTF-8. This would double the size of the file sent to the client in the best case scenario for UTF-8 (everything is 8-bit).

I'll look into making it optional or activate if needed. Not sure when though, I have a lot of pending things for Quartz atm.

plagasul commented 1 week ago

Thank you for your help. I will keep trying to solve it at htaccess level, perhaps contactinhghosting to understand what config is forced, that does not allow me to override, and report back.