jasongitmail / super-sitemap

SvelteKit sitemap focused on ease of use and making it impossible to forget to add your paths.
https://npmjs.com/package/super-sitemap
MIT License
137 stars 3 forks source link
sitemap sitemap-generator sitemap-xml svelte svelte-library sveltejs sveltekit typescript
Svelte Super Sitemap

SvelteKit sitemap focused on ease of use and
making it impossible to forget to add your paths.

license badge npm badge unit tests badge

Table of Contents

Features

Limitations

Installation

npm i -D super-sitemap

or

bun add -d super-sitemap

Then see the Usage, Robots.txt, & Playwright Test sections.

Usage

Basic example

JavaScript:

// /src/routes/sitemap.xml/+server.js
import * as sitemap from 'super-sitemap';

export const GET = async () => {
  return await sitemap.response({
    origin: 'https://example.com',
  });
};

TypeScript:

// /src/routes/sitemap.xml/+server.ts
import * as sitemap from 'super-sitemap';
import type { RequestHandler } from '@sveltejs/kit';

export const GET: RequestHandler = async () => {
  return await sitemap.response({
    origin: 'https://example.com',
  });
};

Always include the .xml extension on your sitemap route name–e.g. sitemap.xml. This ensures your web server always sends the correct application/xml content type even if you decide to prerender your sitemap to static files.

The "everything" example

All aspects of the below example are optional, except for origin and paramValues to provide data for parameterized routes.

JavaScript:

// /src/routes/sitemap.xml/+server.js
import * as sitemap from 'super-sitemap';
import * as blog from '$lib/data/blog';

export const prerender = true; // optional

export const GET = async () => {
  // Get data for parameterized routes however you need to; this is only an example.
  let blogSlugs, blogTags;
  try {
    [blogSlugs, blogTags] = await Promise.all([blog.getSlugs(), blog.getTags()]);
  } catch (err) {
    throw error(500, 'Could not load data for param values.');
  }

  return await sitemap.response({
    origin: 'https://example.com',
    excludeRoutePatterns: [
      '^/dashboard.*', // i.e. routes starting with `/dashboard`
      '.*\\[page=integer\\].*', // i.e. routes containing `[page=integer]`–e.g. `/blog/2`
      '.*\\(authenticated\\).*', // i.e. routes within a group
    ],
    paramValues: {
      '/blog/[slug]': blogSlugs, // e.g. ['hello-world', 'another-post']
      '/blog/tag/[tag]': blogTags, // e.g. ['red', 'green', 'blue']
      '/campsites/[country]/[state]': [
        ['usa', 'new-york'],
        ['usa', 'california'],
        ['canada', 'toronto'],
      ],
    },
    headers: {
      'custom-header': 'foo', // case insensitive; xml content type & 1h CDN cache by default
    },
    additionalPaths: [
      '/foo.pdf', // e.g. to a file in your static dir
    ],
    changefreq: 'daily', // excluded by default b/c ignored by modern search engines
    priority: 0.7, // excluded by default b/c ignored by modern search engines
    sort: 'alpha', // default is false; 'alpha' sorts all paths alphabetically.
    processPaths: (paths) => {
      // A callback to allow arbitrary processing of your path objects. See the
      // processPaths() section of the README.
      return paths;
    },
  });
};

TypeScript:

// /src/routes/sitemap.xml/+server.ts
import type { RequestHandler } from '@sveltejs/kit';
import * as sitemap from 'super-sitemap';
import * as blog from '$lib/data/blog';

export const prerender = true; // optional

export const GET: RequestHandler = async () => {
  // Get data for parameterized routes however you need to; this is only an example.
  let blogSlugs, blogTags;
  try {
    [blogSlugs, blogTags] = await Promise.all([blog.getSlugs(), blog.getTags()]);
  } catch (err) {
    throw error(500, 'Could not load data for param values.');
  }

  return await sitemap.response({
    origin: 'https://example.com',
    excludeRoutePatterns: [
      '^/dashboard.*', // i.e. routes starting with `/dashboard`
      '.*\\[page=integer\\].*', // i.e. routes containing `[page=integer]`–e.g. `/blog/2`
      '.*\\(authenticated\\).*', // i.e. routes within a group
    ],
    paramValues: {
      '/blog/[slug]': blogSlugs, // e.g. ['hello-world', 'another-post']
      '/blog/tag/[tag]': blogTags, // e.g. ['red', 'green', 'blue']
      '/campsites/[country]/[state]': [
        ['usa', 'new-york'],
        ['usa', 'california'],
        ['canada', 'toronto'],
      ],
    },
    headers: {
      'custom-header': 'foo', // case insensitive; xml content type & 1h CDN cache by default
    },
    additionalPaths: [
      '/foo.pdf', // e.g. to a file in your static dir
    ],
    changefreq: 'daily', // excluded by default b/c ignored by modern search engines
    priority: 0.7, // excluded by default b/c ignored by modern search engines
    sort: 'alpha', // default is false; 'alpha' sorts all paths alphabetically.
    processPaths: (paths: sitemap.PathObj[]) => {
      // A callback to allow arbitrary processing of your path objects. See the
      // processPaths() section of the README.
      return paths;
    },
  });
};

Sitemap Index

You can enable sitemap index support with just two changes:

  1. Rename your route to sitemap[[page]].xml
  2. Pass the page param via your sitemap config

JavaScript:

// /src/routes/sitemap[[page]].xml/+server.js
import * as sitemap from 'super-sitemap';

export const GET = async ({ params }) => {
  return await sitemap.response({
    origin: 'https://example.com',
    page: params.page,
    // maxPerPage: 45_000 // optional; defaults to 50_000
  });
};

TypeScript:

// /src/routes/sitemap[[page]].xml/+server.ts
import * as sitemap from 'super-sitemap';
import type { RequestHandler } from '@sveltejs/kit';

export const GET: RequestHandler = async ({ params }) => {
  return await sitemap.response({
    origin: 'https://example.com',
    page: params.page,
    // maxPerPage: 45_000 // optional; defaults to 50_000
  });
};

Feel free to always set up your sitemap in this manner given it will work optimally whether you have few or many URLs.

Your sitemap.xml route will now return a regular sitemap when your sitemap's total URLs is less than or equal to maxPerPage (defaults to 50,000 per the sitemap protocol) or it will contain a sitemap index when exceeding maxPerPage.

The sitemap index will contain links to sitemap1.xml, sitemap2.xml, etc, which contain your paginated URLs automatically.

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap1.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap2.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap3.xml</loc>
  </sitemap>
</sitemapindex>

Optional Params

SvelteKit allows you to create a route with one or more optional parameters like this:

src/
  routes/
    something/
      [[paramA]]/
        [[paramB]]/
          +page.svelte
          +page.ts

Your app would then respond to HTTP requests for all of the following:

Consequently, Super Sitemap will include all such path variations in your sitemap and will require you to either exclude these using excludeRoutePatterns or provide param values for them using paramValues, within your sitemap config object.

For example:

Alternatively, you can exclude ALL versions of this route by providing a single regex pattern within excludeRoutePatterns that matches all of them, such as /something; notice this do NOT end with a $, thereby allowing this pattern to match all 3 versions of this route.

If you plan to mix and match use of excludeRoutePatterns and paramValues for a given route that contains optional params, terminate all of your excludeRoutePatterns for that route with $, to target only the specific desired versions of that route.

processPaths() callback

The processPaths() callback is powerful, but not needed in most cases.

It runs after all paths have been generated for your site, but prior to de-duplication of paths based on unique path names, sorting (if enabled by your config), and creation of XML.

This allows you to arbitrarily process the path objects for your site before they become XML, with the only requirement that your callback function must return the expected type of PathObj[].

This can be useful to do something bespoke that would not otherwise be possible. For example:

  1. Excluding a specific path, when excludeRoutePatterns based on the route pattern would be too broad. (For example, you might want to exclude a path when you have not yet translated its content into one or more of your site’s supported languages; e.g. to exclude only /zh/about, but retain all others like /about, /es/about, etc.)
  2. Adding a trailing slash to URLs (not a recommended style, but possible).
  3. Appending paths from an external sitemap, like from a hosted headless blog backend. However, you can also accomplish this by providing these within the additionalPaths array in your super sitemap config, which is a more concise approach.

Note that processPaths() is intentionally NOT async. This design decision is to encourage a consistent pattern within the sitemap request handler where all HTTP requests, including any to fetch param values from a database, occur together using Promise.all(), for best performance and consistent code pattern among super sitemap users for best DX.

Example code - remove specific paths

return await sitemap.response({
  // ...
  processPaths: (paths: sitemap.PathObj[]) => {
    const pathsToExclude = ['/zh/about', '/de/team'];
    return paths.filter(({ path }) => !pathsToExclude.includes(path));
  },
});

Note: If using excludeRoutePatterns–which matches again the route pattern–would be sufficient for your needs, you should prefer it for performance reasons. This is because a site will have fewer routes than paths, consequently route-based exclusions are more performant than path-based exclusions. Although, the difference will be inconsequential in virtually all cases, unless you have a very large number of excluded paths and many millions of generated paths to search within.

Example code - add trailing slashes

return await sitemap.response({
  // ...
  processPaths: (paths: sitemap.PathObj[]) => {
    // Add trailing slashes to all paths. (This is just an example and not
    // actually recommended. Using SvelteKit's default of no trailing slash is
    // preferable because it provides consistency among all possible paths,
    // even files like `/foo.pdf`.)
    return paths.map(({ path, alternates, ...rest }) => {
      const rtrn = { path: `${path}/`, ...rest };

      if (alternates) {
        rtrn.alternates = alternates.map((alternate: sitemap.Alternate) => ({
          ...alternate,
          path: `${alternate.path}/`,
        }));
      }

      return rtrn;
    });
  },
});

i18n

Super Sitemap supports multilingual site annotations within your sitemap. This allows search engines to be aware of alternate language versions of your pages.

Set up

  1. Create a directory named [[lang]] at src/routes/[[lang]]. Place any routes that you intend to translate inside here.

    • This parameter must be named lang.
    • This parameter can specify a param matcher, if desired. For example: src/routes/(public)/[[lang=lang]], when you defined a param matcher at src/params/lang.js. The param matcher can have any name as long as it uses only lowercase letters.
    • This directory can be located within a route group, if desired, e.g. src/routes/(public)/[[lang]].
    • Advanced: If you want to require a language parameter as part of all your urls, use single square brackets like src/routes/[lang] or src/routes/[lang=lang]. Importantly, if you take this approach, you should redirect your index route (/) to one of your language-specific index paths (e.g. /en, /es, etc), because a root url of / will not be included in the sitemap when you have required the language param to exist. (The remainder of these docs will assume you are using an optional lang parameter.)
  2. Within your sitemap.xml route, update your Super Sitemap config object to add a lang property specifying your desired languages.

    lang: {
     default: 'en',           // e.g. /about
     alternates: ['zh', 'de'] // e.g. /zh/about, /de/about
    }

    The default language will not appear in your URLs (e.g. /about). Alternate languages will appear as part of the URLs within your sitemap (e.g. /zh/about, /de/about).

    These language properties accept any string value, but choose a valid language code. They will appear in two places: 1.) as a slug within your paths (e.g. /zh/about), and 2.) as hreflang attributes within the sitemap output.

    Note: If you used a required lang param (e.g. [lang]), you can set any of your desired languages as the default and the rest as the alternates; they will all be processed in the same way though.

  3. Within your sitemap.xml route again, update your Super Sitemap config object's paramValues to prepend /[[lang]] (or /[[lang=lang]], [lang], etc–whatever you used earlier) onto the property names of all routes you moved into your /src/routes/[[lang]] directory, e.g.:

    paramValues: {
     '/[[lang]]/blog/[slug]': ['hello-world', 'post-2'], // was '/blog/[slug]'
     '/[[lang]]/campsites/[country]/[state]': [ // was '/campsites/[country]/[state]'
       ['usa', 'new-york'],
       ['canada', 'toronto'],
     ],
    },

Example

  1. Create /src/routes/[[lang]]/about/+page.svelte with any content.
  2. Assuming you have a basic sitemap set up at /src/routes/sitemap.xml/+server.ts, add a lang property to your sitemap's config object, as described in Step 2 in the previous section.
  3. Your sitemap.xml will then include the following:
  ...
  <url>
    <loc>https://example.com/about</loc>
    <xhtml:link rel="alternate" hreflang="en" href="https://example.com/about" />
    <xhtml:link rel="alternate" hreflang="zh" href="https://example.com/zh/about" />
    <xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/about" />
  </url>
  <url>
    <loc>https://example.com/de/about</loc>
    <xhtml:link rel="alternate" hreflang="en" href="https://example.com/about" />
    <xhtml:link rel="alternate" hreflang="zh" href="https://example.com/zh/about" />
    <xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/about" />
  </url>
  <url>
    <loc>https://example.com/zh/about</loc>
    <xhtml:link rel="alternate" hreflang="en" href="https://example.com/about" />
    <xhtml:link rel="alternate" hreflang="zh" href="https://example.com/zh/about" />
    <xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/about" />
  </url>
  ...

Note on i18n

Super Sitemap handles creation of URLs within your sitemap, but it is not an i18n library.

You need a separate i18n library to translate strings within your app. Just ensure the library you choose allows a similar URL pattern as described here, with a default language (e.g. /about) and lang slugs for alternate languages (e.g. /zh/about, /de/about).

Q&A on i18n

Sampled URLs

Sampled URLs provides a utility to obtain a sample URL for each unique route on your site–i.e.:

  1. the URL for every static route (e.g. /, /about, /pricing, etc.), and
  2. one URL for each parameterized route (e.g. /blog/[slug])

This can be helpful for writing functional tests, performing SEO analyses of your public pages, & similar.

This data is generated by analyzing your site's sitemap.xml, so keep in mind that it will not contain any URLs excluded by excludeRoutePatterns in your sitemap config.

import { sampledUrls } from 'super-sitemap';

const urls = await sampledUrls('http://localhost:5173/sitemap.xml');
// [
//   'http://localhost:5173/',
//   'http://localhost:5173/about',
//   'http://localhost:5173/pricing',
//   'http://localhost:5173/features',
//   'http://localhost:5173/login',
//   'http://localhost:5173/signup',
//   'http://localhost:5173/blog',
//   'http://localhost:5173/blog/hello-world',
//   'http://localhost:5173/blog/tag/red',
// ]

Limitations

  1. Result URLs will not include any additionalPaths from your sitemap config because it's impossible to identify those by a pattern given only your routes and sitemap.xml as inputs.
  2. sampledUrls() does not distinguish between routes that differ only due to a pattern matcher. For example, /foo/[foo] and /foo/[foo=integer] will evaluated as /foo/[foo] and one sample URL will be returned.

Designed as a testing utility

Both sampledUrls() and sampledPaths() are intended as utilities for use within your Playwright tests. Their design aims for developer convenience (i.e. no need to set up a 2nd sitemap config), not for performance, and they require a runtime with access to the file system like Node, to read your /src/routes. In other words, use for testing, not as a data source for production.

You can use it in a Playwright test like below, then you'll have sampledPublicPaths available to use within your tests in this file.

// foo.test.js
import { expect, test } from '@playwright/test';
import { sampledPaths } from 'super-sitemap';

let sampledPublicPaths = [];
try {
  sampledPublicPaths = await sampledPaths('http://localhost:4173/sitemap.xml');
} catch (err) {
  console.error('Error:', err);
}

// ...

Sampled Paths

Same as Sampled URLs, except it returns paths.

import { sampledPaths } from 'super-sitemap';

const urls = await sampledPaths('http://localhost:5173/sitemap.xml');
// [
//   '/about',
//   '/pricing',
//   '/features',
//   '/login',
//   '/signup',
//   '/blog',
//   '/blog/hello-world',
//   '/blog/tag/red',
// ]

Robots.txt

It's important to create a robots.txt so search engines know where to find your sitemap.

You can create it at /static/robots.txt:

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Or, at /src/routes/robots.txt/+server.ts, if you have defined PUBLIC_ORIGIN within your project's .env and want to access it:

import * as env from '$env/static/public';

export const prerender = true;

export async function GET(): Promise<Response> {
  // prettier-ignore
  const body = [
    'User-agent: *',
    'Allow: /',
    '',
    `Sitemap: ${env.PUBLIC_ORIGIN}/sitemap.xml`
  ].join('\n').trim();

  const headers = {
    'Content-Type': 'text/plain',
  };

  return new Response(body, { headers });
}

Playwright Test

It's recommended to add a Playwright test that calls your sitemap.

For pre-rendered sitemaps, you'll receive an error at build time if your data param values are misconfigured. But for non-prerendered sitemaps, your data is loaded when the sitemap is loaded, and consequently a functional test is more important to confirm you have not misconfigured data for your param values.

Feel free to use or adapt this example test:

// /src/tests/sitemap.test.js

import { expect, test } from '@playwright/test';

test('/sitemap.xml is valid', async ({ page }) => {
  const response = await page.goto('/sitemap.xml');
  expect(response.status()).toBe(200);

  // Ensure XML is valid. Playwright parses the XML here and will error if it
  // cannot be parsed.
  const urls = await page.$$eval('url', (urls) =>
    urls.map((url) => ({
      loc: url.querySelector('loc').textContent,
      // changefreq: url.querySelector('changefreq').textContent, // if you enabled in your sitemap
      // priority: url.querySelector('priority').textContent,
    }))
  );

  // Sanity check
  expect(urls.length).toBeGreaterThan(5);

  // Ensure entries are in a valid format.
  for (const url of urls) {
    expect(url.loc).toBeTruthy();
    expect(() => new URL(url.loc)).not.toThrow();
    // expect(url.changefreq).toBe('daily');
    // expect(url.priority).toBe('0.7');
  }
});

Querying your database for param values

As a helpful tip, below are a few examples demonstrating how to query an SQL database to obtain data to provide as paramValues for your routes:

-- Route: /blog/[slug]
SELECT slug FROM blog_posts WHERE status = 'published';

-- Route: /blog/category/[category]
SELECT DISTINCT LOWER(category) FROM blog_posts WHERE status = 'published';

-- Route: /campsites/[country]/[state]
SELECT DISTINCT LOWER(country), LOWER(state) FROM campsites;

Using DISTINCT will prevent duplicates in your result set. Use this when your table could contain multiple rows with the same params, like in the 2nd and 3rd examples. This will be the case for routes that show a list of items.

Then if your result is an array of objects, convert into an array of arrays of string values:

const arrayOfArrays = resultFromDB.map((row) => Object.values(row));
// [['usa','new-york'],['usa', 'california']]

That's it.

Going in the other direction, i.e. when loading data for a component for your UI, your database query should typically lowercase both the URL param and value in the database during comparison–e.g.:

-- Obviously, remember to escape your `params.slug` values to prevent SQL injection.
SELECT * FROM campsites WHERE LOWER(country) = LOWER(params.country) AND LOWER(state) = LOWER(params.state) LIMIT 10;

Example output

```xml https://example/ daily 0.7 https://example/about daily 0.7 https://example/blog daily 0.7 https://example/login daily 0.7 https://example/pricing daily 0.7 https://example/privacy daily 0.7 https://example/signup daily 0.7 https://example/support daily 0.7 https://example/terms daily 0.7 https://example/blog/hello-world daily 0.7 https://example/blog/another-post daily 0.7 https://example/blog/tag/red daily 0.7 https://example/blog/tag/green daily 0.7 https://example/blog/tag/blue daily 0.7 https://example/campsites/usa/new-york daily 0.7 https://example/campsites/usa/california daily 0.7 https://example/campsites/canada/toronto daily 0.7 https://example/foo.pdf daily 0.7 ```

Changelog

Contributing

git clone https://github.com/jasongitmail/super-sitemap.git
bun install
# Then edit files in `/src/lib`

Publishing

A new version of this npm package is automatically published when the semver version within package.json is incremented.

Credits