vercel / next.js

The React Framework
https://nextjs.org
MIT License
125.66k stars 26.83k forks source link

Next 13 - Sitemap can't fetch on Google Search Console #51649

Closed anthonyjacquelin closed 1 month ago

anthonyjacquelin commented 1 year ago

Verify canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: x64
      Version: Darwin Kernel Version 22.1.0: Sun Oct  9 20:14:54 PDT 2022; root:xnu-8792.41.9~2/RELEASE_X86_64
    Binaries:
      Node: 18.16.0
      npm: 9.5.1
      Yarn: 1.22.19
      pnpm: 7.29.3
    Relevant packages:
      next: 13.4.6
      eslint-config-next: 13.2.4
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 4.9.5

Which area(s) of Next.js are affected? (leave empty if unsure)

App directory (appDir: true)

Link to the code that reproduces this issue or a replay of the bug

https://codesandbox.com

To Reproduce

export default async function sitemap() {
  const db = await connecToDatabase();
  const usersCollection = db.collection("Users");

  // articles
  const articles = await API.getPosts(
    "",
    undefined,
    undefined,
    "published"
  )
    .then((res) => res)
    .catch((error) => console.log("error fetching content"));
  const articleIds = articles?.map((article: Article) => {
    return { id: article?._id, lastModified: article?.createdAt };
  });
  const posts = articleIds.map(({ id, lastModified }) => ({
    url: `${URL}/${id}`,
    lastModified: lastModified,
  }));

  // users
  const profiles = await usersCollection.find({}).toArray();
  const users = profiles
    ?.filter((profile: User) => profile?.userAddress)
    ?.map((profile: User) => {
      return {
        url: `${URL}/profile/${profile.userAddress}`,
        lastModified: new Date().toISOString(),
      };
    });

  // tags
  const tagsFromDb = await articles
    ?.map((article: Article) => article?.categories)
    ?.flat();

  const uniqueTags = tagsFromDb.reduce((acc, tag) => {
    const existingTag = acc.find((item) => item.id === tag.id);

    if (!existingTag) {
      acc.push(tag);
    }

    return acc;
  }, []);

  const tags = uniqueTags
    ?.filter((tag) => tag?.id)
    ?.map((tag) => {
      return {
        url: `${URL}/tags/${tag.id}`,
        lastModified: new Date().toISOString(),
      };
    });

  const staticPages = [
    {
      url: `${URL}`,
      lastModified: new Date().toISOString(),
    },
    { url: `${URL}/about`, lastModified: new Date().toISOString() },
    { url: `${URL}/read`, lastModified: new Date().toISOString() },
  ];

  return [...posts, ...users, ...tags, ...staticPages];
}

Describe the Bug

Hello,

I'm using Next 13 with the /app directory and trying to configure the sitemap of my project on Google search console.

I have used the documentation as described there: Documentation

I have a sitemap.ts in the root of my /app directory, but it seems not recognized by GSC, and i know the sitemap is valid: URL and i've checked also using this tool

Xnapper-2023-06-22-13 15 56

Expected Behavior

I want the /sitemap.xml to be recognized by Google search console.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

MrTob commented 7 months ago

same error here

andyechc commented 7 months ago

I use React with Vite with the sitemap.xml in the public folder and have the same error: Couldn't fech in https://studio20.vercel.app/sitemap.xml So I tried: https://studio20.vercel.app/sitemap.xml/ and still not work. I change the sitemap file name to sitemap-1.xml and it doesn't work too I no have idea of what to do.

ruchernchong commented 7 months ago

I use React with Vite with the sitemap.xml in the public folder and have the same error: Couldn't fech in https://studio20.vercel.app/sitemap.xml So I tried: https://studio20.vercel.app/sitemap.xml/ and still not work. I change the sitemap file name to sitemap-1.xml and it doesn't work too I no have idea of what to do.

Interesting. Although, this is issue is not related to Vercel, but interesting to see that React with Vite is not working as well.

andyechc commented 7 months ago

I use React with Vite with the sitemap.xml in the public folder and have the same error: Couldn't fech in https://studio20.vercel.app/sitemap.xml So I tried: https://studio20.vercel.app/sitemap.xml/ and still not work. I change the sitemap file name to sitemap-1.xml and it doesn't work too I no have idea of what to do.

Interesting. Although, this is issue is not related to Vercel, but interesting to see that React with Vite is not working as well.

Ok, after making all those changes to my sitemap, I renamed the file to sitemap.xml and today when I entered the Search Console, I realized that the first submission of my sitemap was correct, and waiting to index This confirms that it is something related to Google or something else, the truth is that I am confused. A friend who uses Flutter also had a problem with the sitemap, but after having fetch problems, he went to check the file in the search console and when it came back, the sitemap had been read correctly and everything was in order. I think Google has to fix that, it's really crazy

lydhr commented 7 months ago

It seems to be an issue of Google Search Console. My solution was, on Google Search Console, submitting the sitemap url with a suffix: yourwebsite.com/sitemap.xml?sitemap=1 instead of yourwebsite.com/sitemap.xml Reference

FYI, I used next-sitemap package to auto generate my sitemap.xml at every new build.

Shubham-EV commented 6 months ago

anyone able to solve it ?

ruchernchong commented 6 months ago

anyone able to solve it ?

No. Problem might lie with Google Search Console instead.

Tyerlo commented 6 months ago

I had the problem for a long time, but finally resolved it. I made the files static and put them inside the public folder. Created a sitemap.xml and a robots.txt

I also had a middleware that handles languages on route and had to add

export const config = {
    // Matcher ignoring `/_next/` and `/api/`
    matcher: [
        "/((?!api|_next/static|_next/image|sitemap.xml|robots.txt|favicon.ico).*)" //sitemap.xml and robots.txt to make it to work
    ]
};

Be sure that robots.txt can be crawled going into settings inside Google search console and then Crawling robots and then open report, click on the three dots and request a recrawl, after that I could add the sitemap.xml and it succeeded

Shubham-EV commented 6 months ago

@Tyerlo where did you put this code and what is this file name?

Tyerlo commented 6 months ago

@Tyerlo where did you put this code and what is this file name?

I have a middleware.ts file that handle languages on routes.

Here's the entire file

import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

import { i18n } from "./i18n.config";

import { match as matchLocale } from "@formatjs/intl-localematcher";
import Negotiator from "negotiator";

function getLocale(request: NextRequest): string | undefined {
    const negotiatorHeaders: Record<string, string> = {};
    request.headers.forEach((value, key) => (negotiatorHeaders[key] = value));

    // @ts-ignore locales are readonly
    const locales: string[] = i18n.locales;
    const languages = new Negotiator({ headers: negotiatorHeaders }).languages();

    const locale = matchLocale(languages, locales, i18n.defaultLocale);
    return locale;
}

export function middleware(request: NextRequest) {
    const pathname = request.nextUrl.pathname;
    const pathnameIsMissingLocale = i18n.locales.every(
        (locale) => !pathname.startsWith(`/${locale}/`) && pathname !== `/${locale}`
    );
    const isExcludedPath = pathname.startsWith("/img/");
    if (isExcludedPath) {
        // Skip i18n modification for images or other excluded paths
        return;
    }
    // Redirect if there is no locale
    if (pathnameIsMissingLocale) {
        const locale = getLocale(request);

        if (locale === i18n.defaultLocale) {
            return NextResponse.rewrite(
                new URL(
                    `/${locale}${pathname.startsWith("/") ? "" : "/"}${pathname}`,
                    request.url
                )
            );
        }
        return NextResponse.redirect(
            new URL(
                `/${locale}${pathname.startsWith("/") ? "" : "/"}${pathname}`,
                request.url
            )
        );
    }
}

export const config = {
    // Matcher ignoring `/_next/` and `/api/`
    matcher: [
        "/((?!api|_next/static|_next/image|sitemap.xml|robots.txt|favicon.ico).*)"
    ]
};
baxsm commented 5 months ago

I just found out that the sitemap.xml is returning HTTP 304 instead of HTTP 200.

robots.txt returns HTTP 200 and is working fine.

Really need @leerob @amyegan to chime in on this.

It might not be the case. The 304 is just a redirect as it's pulling it from the cache, try ctrl + f5 and it'll be 200.

bouia commented 4 months ago

I added a trailing slash to my sitemap and it started to work. Both links are loading fine on the browser. /sitemap.xml/ And Google managed to pick it up. image https://ruchern.xyz/sitemap.xml/

Next.js 14 App dir here, had the same issue here where Google just says couldn't fetch.

Switched from using sitemap.ts to a sitemap.xml/route.ts to render out my sitemap but no difference.

But adding the trailing slash worked

image

Adding a trailing slash worked for me as well, thank you!

haoolii commented 3 months ago

i have same issue.

Edit by maintainer bot: Comment was automatically minimized because it was considered unhelpful. (If you think this was by mistake, let us know). Please only comment if it adds context to the issue. If you want to express that you have the same problem, use the upvote 👍 on the issue description or subscribe to the issue for updates. Thanks!

devjiwonchoi commented 2 months ago

Hey everyone, seems like the latest canary works, please try and lmk!

@anthonyjacquelin Please reopen if you face the same issue, but with a valid repro link (current is just codesandbox.com). Thank you!

JannikZed commented 2 months ago

@devjiwonchoi can you let me know exactly which version you tested and can we find out, what has been changed to fully understand this issue? Bing seems to read and process the sitemap, so I would really like to understand, what is different, as we also can't just roll-out canary versions in production.

devjiwonchoi commented 2 months ago

exactly which version you tested and can we find out, what has been changed to fully understand this issue

@JannikZed You are absolutely right, will investigate the root cause of this issue!

devjiwonchoi commented 1 month ago

Hey everyone, I've investigated the issue and want to share the result.

TL;DR

Many issues mentioned in the thread are mostly caused by middleware matcher, sitemap, or robots misconfiguration. Also, the tip from John Mueller (adding trailing slash or query param) to refresh the fetch from Google Search worked.

Middleware Matcher Configuration

If you have middleware in your project and have yet to exclude the sitemap on the matcher config, it may cause an issue when accessed. You can try excluding the sitemap path from your middleware. For clearer guidance we will update the docs for it.

export const config = {
  matcher: [
    /*
     * Match all request paths except for the ones starting with:
     * - _next/static (static files)
     * - _next/image (image optimization files)
     * - favicon.ico, sitemap.xml, robots.txt (metadata files)
     */
    '/((?!_next/static|_next/image|favicon.ico|sitemap.xml|robots.txt).*)',
  ],
}

General Sitemap Misconfiguration

This includes:

  1. Blocked by robots.txt
  2. Invalid URL (or redirected)
  3. Google needs to re-fetch
  4. General errors that need awaiting
  5. and more.

Based on the tip from John Mueller, editing the URL may do the trick, which is why the tricks in the comments: adding trailing slash, or query param did work.

Also, if you click the "Couldn't fetch" element, you can view why it wasn't fetched.

Example of couldn't fetch cause of HTTP 404

Screenshot 2024-08-16 at 3 07 32 PM

@JannikZed

next info of my device

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6030
  Available memory (MB): 36864
  Available CPU cores: 12
Binaries:
  Node: 18.18.0
  npm: 9.8.1
  Yarn: N/A
  pnpm: 9.5.0
Relevant Packages:
  next: 14.2.5 // Latest available version is detected (14.2.5).
  eslint-config-next: N/A
  react: 18.3.1
  react-dom: 18.3.1
  typescript: 5.5.4
Next.js Config:
  output: N/A

Screenshot 2024-08-21 at 6 28 59 PM

https://jiwonchoi.dev/sitemap.xml (checkout the lastMod date)

If you still have issues, please open a new issue with a reproduction. Thank you!

c100k commented 1 month ago

@devjiwonchoi thanks for your investigation. But there is another problem for sure. I curl-ed your website and compared to mine and the only difference is you using HTTP/2 and 1.1 on my side. But pretty sure it does not come from here.

Plus he Google console does not give any details on why it failed. Unlike on your screenshots, there is no caret with details on my side.

Since you mentioned the middleware, could having the next.js app served by an Express server via the '*' last route cause the same issue ?

For those still having the issue, can you 👍🏽 if you have an express server and 👎🏽 if you haven't ?

JannikZed commented 1 month ago

@c100k so I have to say, that it got solved for me. In my case, Google was also not saying a reason for the could not fetch. But it was obviously, that my domain was too fresh. After some time and adding one high-value backlink it just went green.

github-actions[bot] commented 4 weeks ago

This closed issue has been automatically locked because it had no new activity for 2 weeks. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.