Closed fedeci closed 2 years ago
With other static site generators (Hugo, Octopress, Gatsby, etc), you typically mark the post as no index in the YAML frontmatter, or some programmable way if pages, etc. This is because those generators follow a specific format and form of posts and pages.
With NextJS, you are free to do whatever you want. So, you have to write a bit of code to match the condition you set.
IMO, good sitemap generation tools that are part of established formats usually honor all three:
From what I see, this package supports the config file exclusions out of the box (see config file options).
As for Frontmatter parsing or some other form of trigging a "skip me!!" option on a per page basis, you can do that in the Transformations. Just test for your condition, and return null
to skip, as the example transformation shows you.
Thanks @eduncan911, I am already doing it in the transformations, however it would be great if it was possible to integrate it directly in the lib. I'll probably fork it and open a PR as soon as possible.
hey, in case someone is still missing this, the solution I'm currently using within my team is a custom transform function. If that's ok I can open a PR with the fix
transform: async (config, path) => {
const noIndexRegex = /<meta.*noindex/gim
const basePath = '.next/serverless/pages'
const filePath = `${basePath + path}.html`
if (fs.existsSync(filePath)) {
try {
const data = await fs.promises.readFile(filePath, 'utf8')
if (data.match(noIndexRegex)) {
console.log('ignored file:', filePath)
return null
}
} catch (error) {
console.error('err', error)
}
}
return {
loc: path,
changefreq: config.changefreq,
priority: config.priority,
lastmod: config.autoLastmod ? new Date().toISOString() : undefined,
alternateRefs: config.alternateRefs || [],
}
},
I remove like this
module.exports = { siteUrl: 'https://www.xxxx, exclude: ['/aaa/', '/xxx', '/yyyy'], // <= exclude here
I remove like this
module.exports = { siteUrl: 'https://www.xxxx, exclude: ['/aaa/', '/xxx', '/yyyy'], // <= exclude here
Yeap but this is not dynamic... 👎
hey, in case someone is still missing this, the solution I'm currently using within my team is a custom transform function. If that's ok I can open a PR with the fix
transform: async (config, path) => { const noIndexRegex = /<meta.*noindex/gim const basePath = '.next/serverless/pages' const filePath = `${basePath + path}.html` if (fs.existsSync(filePath)) { try { const data = await fs.promises.readFile(filePath, 'utf8') if (data.match(noIndexRegex)) { console.log('ignored file:', filePath) return null } } catch (error) { console.error('err', error) } } return { loc: path, changefreq: config.changefreq, priority: config.priority, lastmod: config.autoLastmod ? new Date().toISOString() : undefined, alternateRefs: config.alternateRefs || [], } },
It's always working but on version "next": "12.2.5"
path change (Maybe before i don't know exactly)
Before
const basePath = '.next/serverless/pages'
After
const basePath = '.next/server/pages'
Thank you @GautheyValentin & @gabrielreisn !! this works great.
Is your feature request related to a problem? Please describe. Pages with
<meta content="noindex, follow" name="robots" />
should not be added to the sitemap. I am not sure about how this library works, but I don't think it actually reads the content of the files so it may be hard to detect that meta tag.