vercel / next.js

The React Framework
https://nextjs.org
MIT License
125.09k stars 26.72k forks source link

Generating sitemap #15508

Closed colinhacks closed 3 years ago

colinhacks commented 4 years ago

Feature request

My site contains several dynamic routes and uses the getStaticPaths and getStaticProps hooks from the 9.3 release. I'd like to generate a sitemap for it. It's quite hard.

Describe the solution you'd like

I'd like some way of accessing the list of paths/pages generated during the build step. The information I need is printed to the CLI during "next build" but there's no way (I think) for the developer to tap into it:

Screen Shot 2020-07-26 at 8 02 08 PM

Describe alternatives you've considered

There are several other issues on here describing incomplete solutions that only work for static pages. My page has several dynamic routes ([blog.tsx]). I'm not aware of any solution that works.

Additional context

Next.js is 🔥, this is the one thing tripping me up 🤙

noobnoobdc137 commented 4 years ago

It would be awesome if nextjs supports this out of the box 🚀

darshkpatel commented 4 years ago

What would be the ideal format of the sitemap? Will a simple JSON containing all the routes do? I can whip up a PR for it

rokinsky commented 4 years ago

Solution based on information from the build manifest is not ideal, because pages with getServerSideProps, getInitialProps and getStaticProps with fallback option will not be supported. It is much better to create the sitemap on the fly (the approach was described in discussions multiple times) because only you know your web application better.

Here is my boilerplate from pages/sitemap.xml.jsx:

import { renderToStaticMarkup } from "react-dom/server";

const SitemapIndex = () => null;

const Sitemap = ({ pages, origin }) => {
  /*
   * NOTE: <?xml ?> is optional preamble from the spec,
   *  UTF-8 is the default
   *  version 1.0 is the default
   */
  return (
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      {pages?.map((page, index) => {
        return (
          <url key={index}>
            <loc>{[origin, page?.fullSlug].join("/")}</loc>
            <lastmod>{page?.publishedAt}</lastmod>
          </url>
        );
      })}
    </urlset>
  );
};

export const getServerSideProps = async ({ res }) => {
  const pages = // TODO: fetch your pages
  const origin = // TODO: place your origin

  res.setHeader("Content-Type", "text/xml");
  res.write(renderToStaticMarkup(<Sitemap pages={pages} origin={origin} />));
  res.end();

  return {
    props: {},
  };
};

export default SitemapIndex;
francisrod01 commented 4 years ago

I think this is related: How to create a routes list component

colinhacks commented 4 years ago

@rokinsky good point. It would for there to be a solution that works for the classic SSG/getStaticPaths approach though, which is an increasingly popular usage pattern among Next.js users.

iamvishnusankar commented 4 years ago

@vriad Check with-next-sitemap example. This might be the solution you're looking for.

arvigeus commented 4 years ago

@iamvishnusankar Your solution is great, but will not work if there are mixed static/dynamic pages like:

iamvishnusankar commented 4 years ago

EDIT

next-sitemap now supports dynamic/server-side sitemap generation.

--- OLD---- @arvigeus Yes, you're correct. next-sitemap wont work for dynamic pages rendered using getServerSideProps (it works for getStaticProps though).

Its because the package relies on build and pre-render manifest files which are only available after next build.

Also, if you're parsing the same set of posts every time, wouldn't it be better to pre-render them to reduce server load? (Or use Incremental Static Regeneration)

arvigeus commented 4 years ago

My proposal would be:

On build check if there is /pages/sitemap.xml.js. If file does not exist, skip sitemap generation

If file contains getStaticProps, the function will receive one parameter: list of objects with data for each sitemap item. The function will return the same type of array of object (e.g. allow modification, adding new paths, omitting, etc). On build this will be translated to sitemap.xml in output directory and robots.txt

If file contains getServerSideProps it will be mostly the same, serverside executed on every request, while allowing async loading of more pages. Response type will be set to xml

If given page has [foo].js as name, it will check for getStaticPaths and populate entries from there. If getStaticPaths is missing, display warning.

Bonus: Allow partial sitemaps using arbitrary files with specific exports

lachlanjc commented 3 years ago

Excellent blogpost on the topic: https://leerob.io/blog/nextjs-sitemap-robots

related: #9824

marcofranssen commented 3 years ago

What is the status of this feature request?

Would love to be able to generate these statically as well using the server side rendering.

lachlanjc commented 3 years ago

It's totally possible to do manually, either at build time (like https://github.com/hackclub/v3/blob/main/lib/sitemap.js) or making an API route that generates similar output & adding a rewrite in next.config.js for its path. I agree it'd be awesome to make easier to do though.

typeofweb commented 3 years ago

We kind of solved this problem by using next api routes with high s-max-age value. It certainly does the job!

marcofranssen commented 3 years ago

@mmiszy @lachlanjc how would I expose my sitemap at <marcofranssen.nl/sitemap.xml> using your approach?

I'm in the process of migrating my existing blog so would like to stick as much as possible to current structures for SEO.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>https://marcofranssen.nl/post-sitemap.xml</loc>
        <lastmod>2020-12-05T09:54:22.154Z</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://marcofranssen.nl/page-sitemap.xml</loc>
        <lastmod>2019-12-19T11:51:29.151Z</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://marcofranssen.nl/category-sitemap.xml</loc>
        <lastmod>2020-12-05T09:54:22.154Z</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://marcofranssen.nl/tag-sitemap.xml</loc>
        <lastmod>2020-12-05T09:54:22.154Z</lastmod>
    </sitemap>
</sitemapindex>
<!-- XML Sitemap generated by Hexo SEO Friendly Sitemap Generator -->

Looking for similar solution for my RSS/Atom feed.

e.g. What if I make following:

pages/api/sitemap.xml.js
pages/api/tag-sitemap.xml.js
pages/api/category-sitemap.xml.js
pages/api/page-sitemap.xml.js
pages/api/post-sitemap.xml.js
pages/api/atom.xml.js

How do I rewrite those urls to

https://marcofranssen.nl/sitemap.xml
https://marcofranssen.nl/tag-sitemap.xml
https://marcofranssen.nl/category-sitemap.xml
https://marcofranssen.nl/page-sitemap.xml
https://marcofranssen.nl/post-sitemap.xml
https://marcofranssen.nl/atom.xml

For the Nextjs team:

It would be great if the regular pages could also be used to generate non HTML pages so these can benefit from the same SSR and static rendering. Maybe by adding a param this could by possible without breaking backward compatibility.

typeofweb commented 3 years ago

@marcofranssen you can set up rewrites in your next.config.js

lachlanjc commented 3 years ago

@marcofranssen In your next.config.js:

module.exports = {
  async rewrites() {
    return [
      {
        source: '/sitemap.xml',
        destination: '/api/sitemap',
      },
      // add more here
    ]
  }
}

More on rewrites: https://nextjs.org/docs/api-reference/next.config.js/rewrites

iamvishnusankar commented 3 years ago

next-sitemap now supports static/pre-rendered/dynamic/server-side sitemap(s) & robots.txt generation.