Closed oscarotero closed 9 months ago
For the Options
type, perhaps we can refer to nuxt-modules/robots.
(Maybe convert it to snake case / camel case?)
// PascalCase
site.use(robots([
{
UserAgent: 'ChatGPT-User',
Disallow: '/',
},
{
Comment: 'Sitemap',
Sitemap: 'https://lume.land/sitemap.xml',
}
]))
// snake_case
site.use(robots([
{
user_agent: 'ChatGPT-User',
disallow: '/',
},
{
comment: 'Sitemap',
sitemap: 'https://lume.land/sitemap.xml',
}
]))
User-agent: ChatGPT-User
Disallow: /
# Sitemap
Sitemap: https://lume.land/sitemap.xml
That's a good reference, thanks! But Lume plugins always use objects for options, so maybe this structure fits better:
site.use(robots({
agents: [
{
name: "ChatGPT-User",
disallow: "/",
},
],
sitemap: "https://lume.land/sitemap.xml"
}));
I'd like to include some shortcuts to make it more ergonomic:
site.use(robots({
agents: [
"ChatGP-User", // shortcut to "disalow: /",
"AI", // Shortcut to all AI agents.
]
}));
That's a good reference, thanks! But Lume plugins always use objects for options, so maybe this structure fits better:
Perhaps it would be more appropriate to use the disallow
keyword rather than agents
.
site.use(robots({
disallow: ['ChatGPT-User'],
rules: [{
userAgent: '*',
allow: '/'
}],
sitemap: 'https://lume.land/sitemap.xml',
}))
Also, I don't think maintaining an AI agents list is quite necessary.
Thinking of privacity and good defaults, maybe the plugin should disable access by default and only grant access to bots explicity defined. For example:
site.use(robots({
allow: ["Google", "Bing", "Yahoo", "ChatGPT"],
paths: "/"
}));
This would generate this file:
User-Agent: *
Disallow: /
User-Agent: Googlebot
Allow: /
User-Agent: Bingbot
Allow: /
User-Agent: Yahoo-MMCrawler
Allow: /
User-Agent: ChatGPT-User
Allow: /
Thinking of privacity and good defaults, maybe the plugin should disable access by default and only grant access to bots explicity defined.
Ideally can let users set up to use blacklist or whitelist mode.
site.use(robots({
whitelist: true,
allow: ['ChatGPT-User']
}))
site.use(robots({
// whitelist: false, (default)
disallow: ['ChatGPT-User']
}))
Given the myriad of possible UserAgent values, I likewise think it's best not to manage name conversions. (like Google => Googlebot
, ChatGPT => ChatGPT-User
)
Enter your suggestions in details:
It would be usefult to block AI agents, for example: https://darkvisitors.com/robots-txt-builder