vercel / next.js

The React Framework
https://nextjs.org
MIT License
125.66k stars 26.83k forks source link

Next 13 - Sitemap can't fetch on Google Search Console #51649

Closed anthonyjacquelin closed 1 month ago

anthonyjacquelin commented 1 year ago

Verify canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: x64
      Version: Darwin Kernel Version 22.1.0: Sun Oct  9 20:14:54 PDT 2022; root:xnu-8792.41.9~2/RELEASE_X86_64
    Binaries:
      Node: 18.16.0
      npm: 9.5.1
      Yarn: 1.22.19
      pnpm: 7.29.3
    Relevant packages:
      next: 13.4.6
      eslint-config-next: 13.2.4
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 4.9.5

Which area(s) of Next.js are affected? (leave empty if unsure)

App directory (appDir: true)

Link to the code that reproduces this issue or a replay of the bug

https://codesandbox.com

To Reproduce

export default async function sitemap() {
  const db = await connecToDatabase();
  const usersCollection = db.collection("Users");

  // articles
  const articles = await API.getPosts(
    "",
    undefined,
    undefined,
    "published"
  )
    .then((res) => res)
    .catch((error) => console.log("error fetching content"));
  const articleIds = articles?.map((article: Article) => {
    return { id: article?._id, lastModified: article?.createdAt };
  });
  const posts = articleIds.map(({ id, lastModified }) => ({
    url: `${URL}/${id}`,
    lastModified: lastModified,
  }));

  // users
  const profiles = await usersCollection.find({}).toArray();
  const users = profiles
    ?.filter((profile: User) => profile?.userAddress)
    ?.map((profile: User) => {
      return {
        url: `${URL}/profile/${profile.userAddress}`,
        lastModified: new Date().toISOString(),
      };
    });

  // tags
  const tagsFromDb = await articles
    ?.map((article: Article) => article?.categories)
    ?.flat();

  const uniqueTags = tagsFromDb.reduce((acc, tag) => {
    const existingTag = acc.find((item) => item.id === tag.id);

    if (!existingTag) {
      acc.push(tag);
    }

    return acc;
  }, []);

  const tags = uniqueTags
    ?.filter((tag) => tag?.id)
    ?.map((tag) => {
      return {
        url: `${URL}/tags/${tag.id}`,
        lastModified: new Date().toISOString(),
      };
    });

  const staticPages = [
    {
      url: `${URL}`,
      lastModified: new Date().toISOString(),
    },
    { url: `${URL}/about`, lastModified: new Date().toISOString() },
    { url: `${URL}/read`, lastModified: new Date().toISOString() },
  ];

  return [...posts, ...users, ...tags, ...staticPages];
}

Describe the Bug

Hello,

I'm using Next 13 with the /app directory and trying to configure the sitemap of my project on Google search console.

I have used the documentation as described there: Documentation

I have a sitemap.ts in the root of my /app directory, but it seems not recognized by GSC, and i know the sitemap is valid: URL and i've checked also using this tool

Xnapper-2023-06-22-13 15 56

Expected Behavior

I want the /sitemap.xml to be recognized by Google search console.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

SuttonJack commented 1 year ago

Do you have a file at app/robots.ts? See here for an example.

This file lets engines and crawlers know where to find your sitemap. You can read more about it here

ryuji-orca commented 1 year ago

Do you have a file at app/robots.ts? See here for an example.

This file lets engines and crawlers know where to find your sitemap. You can read more about it here

same issue. I have enabled the sitemap and have added the following code to app/robots.ts but cannot register the sitemap.

import type { MetadataRoute } from "next"

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: "*",
      },
    ],
    sitemap: "https://my-url.xyz/sitemap.xml",
    host: "https://my-url.xyz",
  }
}

Maybe some more time needs to pass, so I'll give it a little more time.

JasonA-work commented 1 year ago

Have you tried putting it in the /public folder instead?

ryuji-orca commented 1 year ago

Have you tried putting it in the /public folder instead?

I've tried, but no..., I don't think public has anything to do with it because the official and leerob sites put their sitemap and robots files in app/.

https://github.com/vercel/commerce/blob/70dcfa9736bb2067713a425e17ee6e59fb3fca2b/app/sitemap.ts#L8 https://github.com/leerob/leerob.io/blob/main/app/sitemap.ts

anthonyjacquelin commented 1 year ago

Has someone solved this issue ? I'm still stuck on it without any pieces of possible solution...

CJEnright commented 1 year ago

Also experiencing this. Putting a sitemap.xml file in public with appdir cannot be parsed by Google Search Console. Falling back to pages and following this older tutorial does work.

anthonyjacquelin commented 1 year ago

Also experiencing this. Putting a sitemap.xml file in public with appdir cannot be parsed by Google Search Console. Falling back to pages and following this older tutorial does work.

I tried but even if my new sitemap is valid, nothing changed...

loverphp487 commented 1 year ago

Has someone solved this issue ? I'm still stuck on it without any pieces of possible solution...

i did as same document of next.js has wroten for robots.ts and sitemap.xml and has same problem

anthonyjacquelin commented 1 year ago

Has someone managed this error in any ways ? Still encounter the problem on my side

octane96 commented 1 year ago

Next 13.4.7 I have the same problem. I can see the sitemap.xml path from my browser as far as I can access it...

octane96 commented 1 year ago

After saving the dynamically generated sitemap.xml from the browser and storing it in the public directory, Google Search Console was able to load it. I am not sure if this is a Next.js issue or a Google Search Console issue, but I will get by with this for now. Very disappointed...

anthonyjacquelin commented 1 year ago

After saving the dynamically generated sitemap.xml from the browser and storing it in the public directory, Google Search Console was able to load it.

I am not sure if this is a Next.js issue or a Google Search Console issue, but I will get by with this for now.

Very disappointed...

Thanks for your feedback, so at the end of the day this is not dynamic anymore...

octane96 commented 1 year ago

Thanks for your feedback, so at the end of the day this is not dynamic anymore...

That's right... I agree.

anthonyjacquelin commented 1 year ago

Thanks for your feedback, so at the end of the day this is not dynamic anymore...

That's right...

I agree.

So maybe we could create a cron api route that will write this sitemap.xml file every day or week using fs

octane96 commented 1 year ago

Thanks for your feedback, so at the end of the day this is not dynamic anymore...

That's right... I agree.

So maybe we could create a cron api route that will write this sitemap.xml file every day or week using fs

Thanks for the very good ideas! I've written a simple cron for now, so I'll get by with that for a while!

octane96 commented 1 year ago

This is a bit off topic, but it seems that sitemap.ts is built static. Is that how it is supposed to be...?

If so, it does not have to be cron.

anthonyjacquelin commented 1 year ago

This is a bit off topic, but it seems that sitemap.ts is built static.

Is that how it is supposed to be...?

If so, it does not have to be cron.

I'm not sure that sitemap.xml has to be statically generated.

The most important thing is to have an up to date version of your sitemap if you have dynamic pages being created.

octane96 commented 1 year ago

The most important thing is to have an up to date version of your sitemap if you have dynamic pages being created.

I agree.

Sorry if I didn't communicate it well. I could see the build log at hand, which seems to be dynamically generated only at build time to begin with. I wish it would always be generated dynamically.

lucas-soler commented 1 year ago

Hi guys!

I think it is not an error. Neither on Google nor Vercel. Better saying, I'm not sure it is not kinda an error on Google, because I really think it should have a better message to this situation. You can read further info about this in the link below: https://support.google.com/webmasters/thread/184533703/are-you-seeing-couldn-t-fetch-reported-for-your-sitemap?hl=en&sjid=15254935347152386554-SA

I spent 30 minutes searching on the web thinking it was a problem.

anthonyjacquelin commented 1 year ago

Hi guys!

I think it is not an error. Neither on Google nor Vercel. Better saying, I'm not sure it is not kinda an error on Google, because I really think it should have a better message to this situation. You can read further info about this in the link below: https://support.google.com/webmasters/thread/184533703/are-you-seeing-couldn-t-fetch-reported-for-your-sitemap?hl=en&sjid=15254935347152386554-SA

I spent 30 minutes searching on the web thinking it was a problem.

If it is not a bug and just due to time needed for google to process the sitemap, all our sitemaps would have been handled by google after a while. The fact is that even after 1 month i still see "can't fetch".

So there might be a bigger problem than just a messy error message + time needed for google to handle it.

c100k commented 1 year ago

I'm not using the app folder (my sitemap.xml is a simple public file) and I had the same issue.

After waiting for almost a month, I tried something else : I created sitemap2.xml and it fetched it successfully. Both files are identical...

Screenshot 2023-08-22 at 16 46 10

I think Google keeps some cache of a file and if it failed retrieving it once, it fails again and again. So probably not related to Next.js at all.

ryuji-orca commented 1 year ago

It's been quite a while since I posted the first article, but it still hasn't been registered in the sitemap 😓.

The following issue states that after changing from <xml version="1.0" encoding="UTF-8">...</xml>to <?xml version="1.0" encoding="UTF-8"?>, it was reported that it got registered in the Search Console. However, in the case of the app's sitemap.xml, is it necessary to include the <?xml version="1.0" encoding="UTF-8"?> declaration? 🧐.

Google Article

I've also noticed someone on X experiencing the same error as me. However, it seems to be happening with Remix as well, so it might be a problem on Google's side.

seogki commented 1 year ago

I end up create sitemap.xml file in public and copy pasted created sitemap.xml file through nextjs

Now google is able to find my sitemap.xml...

but i still want valid information too because i don't want copy paste all the time whenever my sitemap changes

anthonyjacquelin commented 1 year ago

Anyone using the app directory managed to make it work ?

iperdev commented 1 year ago

After encountering the same issues as you had, in my case with "next": "13.4.19" App Router and their native solution for sitemap (https://nextjs.org/docs/app/api-reference/file-conventions/metadata/sitemap#generate-a-sitemap)

I found this article and applied the Next.js 13.2 and lower solution proposed by the article https://claritydev.net/blog/nextjs-dynamic-sitemap-pages-app-directory#nextjs-132-and-lower

What happened? The route app/sitemap.xml/route.ts didn't work, and I suspected it might be due to caching by Google...

...so I tried app/sitemap2.xml/route.ts, and it worked (yep, same code...)

Now, sitemap2.xml is working properly in Google Search Console.

My sitemap.xml is still available with the same code, but Search Console is unable to fetch it. I removed it and added it again, and it's still not working. So, my plan is to remove it for some days or weeks and then try adding it again. At least, I'm indexing with sitemap2.xml, which is dynamic.

felri commented 1 year ago

After encountering the same issues as you had, in my case with "next": "13.4.19" App Router and their native solution for sitemap (https://nextjs.org/docs/app/api-reference/file-conventions/metadata/sitemap#generate-a-sitemap)

I found this article and applied the Next.js 13.2 and lower solution proposed by the article https://claritydev.net/blog/nextjs-dynamic-sitemap-pages-app-directory#nextjs-132-and-lower

What happened? The route app/sitemap.xml/route.ts didn't work, and I suspected it might be due to caching by Google...

...so I tried app/sitemap2.xml/route.ts, and it worked (yep, same code...)

Now, sitemap2.xml is working properly in Google Search Console.

My sitemap.xml is still available with the same code, but Search Console is unable to fetch it. I removed it and added it again, and it's still not working. So, my plan is to remove it for some days or weeks and then try adding it again. At least, I'm indexing with sitemap2.xml, which is dynamic.

I tried that

export async function GET(req: NextRequest) {
  const sitemap: any = await getSitemap();

  const toXml = (urls: any) => `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
${urls
  .map((item: any) => {
    return `
<url>
    <loc>${item.url}</loc>
    <lastmod>${item.lastModified}</lastmod>
    <changefreq>${item.changeFrequency}</changefreq>
    <priority>${item.priority}</priority>
</url>
    `;
  })
  .join('')}
</urlset>`;

  return new Response(toXml(sitemap), {
    status: 200,
    headers: {
      'Cache-control': 'public, s-maxage=86400, stale-while-revalidate',
      'content-type': 'application/xml'
    }
  });
}

But google can't find it, I also tried the sitemap2.xml inside the public folder and I got the same error.

didyk commented 1 year ago

Have the same issue with pages router. My sitemap google can't fetch at least 6 months. I tried with sitemap2.xml inside the public folder and it doesn't work too.

Does someone have successfully experience with adding sitemap and pages router?

ghost commented 11 months ago

https://www.jcchouinard.com/sitemap-could-not-be-read-couldnt-fetch-in-google-search-console/#:~:text=The%20%E2%80%9CSitemap%20could%20not%20be,the%20sitemap%20is%20not%20indexed

ruchernchong commented 11 months ago

I added a trailing slash to my sitemap and it started to work. Both links are loading fine on the browser.

/sitemap.xml/

And Google managed to pick it up.

image

https://ruchern.xyz/sitemap.xml/

ackshaey commented 9 months ago

In case it helps more people, @ruchernchong 's suggestion above worked and that basically confirms Google caches the failed sitemap.xml unless you change the path when resubmitting it. I confirmed Cloudflare was blocking the Bingbot and Googlebot on their basic plan, so had to turn off the Bot fighting mode and resubmit the sitemap.xml with the trailing slash to get it to work. What shocks me is that this is still a bug and no team at Google has bothered to fix it yet. It has to be hurting their search coverage.

ruchernchong commented 9 months ago

In case it helps more people, @ruchernchong 's suggestion above worked and that basically confirms Google caches the failed sitemap.xml unless you change the path when resubmitting it. I confirmed Cloudflare was blocking the Bingbot and Googlebot on their basic plan, so had to turn off the Bot fighting mode and resubmit the sitemap.xml with the trailing slash to get it to work. What shocks me is that this is still a bug and no team at Google has bothered to fix it yet. It has to be hurting their search coverage.

Indeed. This is really a weird bug that the team at Google does not seem to be resolving.

jacobhjkim commented 9 months ago

So this is not an issue with Next.js? Rather a weird cache strategy from Google Search Console team?

ruchernchong commented 9 months ago

So this is not an issue with Next.js? Rather a weird cache strategy from Google Search Console team?

Yes

qhanw commented 9 months ago

@ruchernchong According to your solution, the status is still showing "couldn't fetch". After checking 'sitemap.xml', I don't find other issues and it can be accessed normally in the browser. Is there any other solutions?

ackshaey commented 9 months ago

@ruchernchong According to your solution, the status is still showing "couldn't fetch". After checking 'sitemap.xml', I don't find other issues and it can be accessed normally in the browser. Is there any other solutions?

@qhanw have you confirmed traffic from Googlebot is not being blocked by firewall rules? If you have Cloudflare or any other proxy I'd check to verify that. The thing is, once a sitemap fetch fails it'll keep failing, so if it was being blocked previously I'd add an allowlist for the user agent and change the path again to re-fetch.

qhanw commented 9 months ago

@ruchernchong According to your solution, the status is still showing "couldn't fetch". After checking 'sitemap.xml', I don't find other issues and it can be accessed normally in the browser. Is there any other solutions?

@qhanw have you confirmed traffic from Googlebot is not being blocked by firewall rules? If you have Cloudflare or any other proxy I'd check to verify that. The thing is, once a sitemap fetch fails it'll keep failing, so if it was being blocked previously I'd add an allowlist for the user agent and change the path again to re-fetch.

I am using Vercel to deploy the website here, and I am not using Cloudflare proxy either. It is sure that traffic from Googlebot is not blocked by firewall rules. My sitemap.xml file URL is: https://qhan.wang/sitemap.xml

ruchernchong commented 9 months ago

@ruchernchong According to your solution, the status is still showing "couldn't fetch". After checking 'sitemap.xml', I don't find other issues and it can be accessed normally in the browser. Is there any other solutions?

@qhanw have you confirmed traffic from Googlebot is not being blocked by firewall rules? If you have Cloudflare or any other proxy I'd check to verify that. The thing is, once a sitemap fetch fails it'll keep failing, so if it was being blocked previously I'd add an allowlist for the user agent and change the path again to re-fetch.

I am using Vercel to deploy the website here, and I am not using Cloudflare proxy either. It is sure that traffic from Googlebot is not blocked by firewall rules. My sitemap.xml file URL is: https://qhan.wang/sitemap.xml

I saw your sitemap. It should have worked but I have no idea why it isn't. This is likely a Google problem more than Vercel.

You can try this with Bing Webmaster tools to see if your sitemap is actually working.

ruchernchong commented 8 months ago

I just spinned up a project in Astro and add a sitemap to it. Google was able to crawl /sitemap-index.xml; without the need for me to add a trailing slash.

image

This seemed to be a Next.js problem. (I tried deploying both on Vercel and SST, and both returned the same thing)

Stelkooo commented 8 months ago

I added a trailing slash to my sitemap and it started to work. Both links are loading fine on the browser.

/sitemap.xml/

And Google managed to pick it up.

image

https://ruchern.xyz/sitemap.xml/

Next.js 14 App dir here, had the same issue here where Google just says couldn't fetch.

Switched from using sitemap.ts to a sitemap.xml/route.ts to render out my sitemap but no difference.

But adding the trailing slash worked

image

lucas-soler commented 8 months ago

Hi, guys, My site was finally discovered by Google, and the sitemap has been successfully fetched. Today, I took two steps:

Firstly, I moved the 'sitemap.xml' file from the 'app' folder to the 'public' folder. Secondly, and finally, I removed a redirect rule within a middleware that handles internationalization. I removed everything related to the 'public' folder from this rule.

To my surprise, it worked!

ruchernchong commented 8 months ago

Hi, guys, My site was finally discovered by Google, and the sitemap has been successfully fetched. Today, I took two steps:

Firstly, I moved the 'sitemap.xml' file from the 'app' folder to the 'public' folder. Secondly, and finally, I removed a redirect rule within a middleware that handles internationalization. I removed everything related to the 'public' folder from this rule.

To my surprise, it worked!

@leerob what are your thoughts on this? Moving to using the pages directory.

thanhtutzaw commented 8 months ago

Hi, guys, My site was finally discovered by Google, and the sitemap has been successfully fetched. Today, I took two steps: Firstly, I moved the 'sitemap.xml' file from the 'app' folder to the 'public' folder. Secondly, and finally, I removed a redirect rule within a middleware that handles internationalization. I removed everything related to the 'public' folder from this rule. To my surprise, it worked!

@leerob what are your thoughts on this? Moving to using the pages directory.

I am using pages dir in Nextjs 14 and still get not fetch issue . https://thz.vercel.app/sitemap.xml

thanhtutzaw commented 8 months ago

Looks like it is Pending
It is a sitemap and not a page, it shouldn’t be indexed!

Instead, use the Live Test to check if Googlebot can fetch it!

If Google returns “URL is available to Google”, that means Google can fetch your sitemap.

Then, it is likely that the status should be interpreted as Pending. All you have to do is wait. https://www.jcchouinard.com/sitemap-could-not-be-read-couldnt-fetch-in-google-search-console/#:~:text=The%20%E2%80%9CSitemap%20could%20not%20be,the%20sitemap%20is%20not%20indexed

pedrosorrentino commented 8 months ago

I am using Next.js 14 with app dir and I have the same problem. It doesn't recognize the sitemap.xml and I've added it in several ways: https://domain.com/sitemap.xml, https://domain.com/sitemap.xml/

But SC still does not recognize the sitemap. Has anyone found a possible solution? I am generating the sitemap dynamically since there are more than 300 pages. Thanks and greetings to all

nekochan0122 commented 8 months ago

Same issue here.

image

ruchernchong commented 8 months ago

I just found out that the sitemap.xml is returning HTTP 304 instead of HTTP 200.

robots.txt returns HTTP 200 and is working fine.

Really need @leerob @amyegan to chime in on this.

slamer59 commented 7 months ago

I just found out that the sitemap.xml is returning HTTP 304 instead of HTTP 200.

robots.txt returns HTTP 200 and is working fine.

Really need @leerob @amyegan to chime in on this.

I also had a 304 code

I tried to export const dynamic = "force-dynamic"

But i still experienced couldn't fetch ...

ruchernchong commented 7 months ago

I just found out that the sitemap.xml is returning HTTP 304 instead of HTTP 200. robots.txt returns HTTP 200 and is working fine. Really need @leerob @amyegan to chime in on this.

I also had a 304 code

I tried to export const dynamic = "force-dynamic"

But i still experienced couldn't fetch ...

I used the unstable_noStore() and it did not work despite the sitemap.xml is now returning HTTP 200.

iamjoshua commented 7 months ago

A week after submitting my Next.js 14/app dir sitemap to google search console and it still showed a "Sitemap could not be read" error. I think, though I'm not sure, that I submitted the sitemap before adding the robots.txt file to the app directory. Even though Googlebot was allowed to view the sitemap, perhaps it had an outdated cache of its permissions.

With this hunch, I use the "URL inspection" bar in Google Search Console and inspected my sitemap.xml. It gave me an error. I then inspected my robots.txt url without error. I then reinspected my sitemap and this time there was no error. I went back to the sitemap page, removed the existing reference, and resubmitted my sitemap. This time it instantly fetched my sitemap with a success status.

Perhaps this will help others with this issue.

ruchernchong commented 7 months ago

A week after submitting my Next.js 14/app dir sitemap to google search console and it still showed a "Sitemap could not be read" error. I think, though I'm not sure, that I submitted the sitemap before adding the robots.txt file to the app directory. Even though Googlebot was allowed to view the sitemap, perhaps it had an outdated cache of its permissions.

With this hunch, I use the "URL inspection" bar in Google Search Console and inspected my sitemap.xml. It gave me an error. I then inspected my robots.txt url without error. I then reinspected my sitemap and this time there was no error. I went back to the sitemap page, removed the existing reference, and resubmitted my sitemap. This time it instantly fetched my sitemap with a success status.

Perhaps this will help others with this issue.

This did not work for me unfortunately. It still "cannot be read" by Google.