sybrew / the-seo-framework

The SEO Framework WordPress plugin.
https://theseoframework.com/
GNU General Public License v3.0
417 stars 47 forks source link

Robots.txt edits #59

Open sybrew opened 7 years ago

sybrew commented 7 years ago

Feature: Add an easy interface (like cPanel's DNS editor) to manage robots.txt "file" output. Only to be working when no static file is present. i.e. Only by the use of filters.

I do not want to over-write files nor leave permanent marks.

Planned as a free extension πŸ˜„.

proweb commented 7 years ago

how about multisite support? Different robots.txt for different domains.

sybrew commented 7 years ago

Multisite support is fundamental and by default for all extensions unless otherwise stated. That's why I noted that it will only work when no static file is present πŸ˜„.

Also, all sites I own are on a Multisite network so you don't have to worry about that!

proweb commented 7 years ago

so I can generate robots.txt different for any site in my network? How to do it? Is where any manual?

sybrew commented 7 years ago

All open issues are just a draft for now, there's no operational code yet. This includes this issue.

The idea is that it will be different for each site in the network, yes.

proweb commented 7 years ago

ОК, thanks @sybrew

trainoasis commented 5 years ago

At the moment, it's not possible to add a Disallow in robots.txt via the plugin, right?

sybrew commented 5 years ago

Hi @trainoasis

That's correct. You'd want to use a WordPress filter, at priority >10, instead:

add_filter( 'robots_txt', function( $robots ) {

    $my_robots = <<<'MYROBOTS'
User-agent: some-bot
Disallow: /

MYROBOTS;

    return $my_robots . $robots;
}, 11 );
ghost commented 5 years ago

Hey @sybrew I need to manually a disallow code in the robots.txt, but I figured out that the plugin currently does not allow that and there is no robots.txt in the root folder for me to edit. So can you tell me how to add it? If I have to use the above WordPress filter, then where do I add the filter? In functions.php? or somewhere else. sorry, it may sound silly but I googled and could not find anything reliable.

sybrew commented 5 years ago

Hi @chandlerbing26

You can do either of the following:

  1. Add a robots.txt file to the root of your website anyway; then you'll have complete control over its contents. This is probably your best bet, but it does not translate well with WordPress Multisite's domain mapping (a corner case).
  2. Add to or overwrite our filters as you described. Yes, that can be added to the functions.php file. See https://tsf.fyi/docs/filters#where to learn about alternative methods.
ghost commented 5 years ago

hi @sybrew thanks I added a robots.txt in my root folder and that worked. Thanks for replying. Maybe add a robots.txt editor maybe in upcoming versions😊😁

vir-gomez commented 4 years ago

Hi @sybrew, I recently added the Blackhole for Bad Bots plugin by Jeff Starr and I must add some lines with a directive on the robots.txt

I remember with Yoast o others, I had my robots.txt on the mail path html_public directory, but now, with The SEO Framework, the robots.txt is added dynamically and I don't know how to edit manually..

Any suggestions? How could I add an small directive like the following to send to Bing crawler or Google spiders?

User-agent: * Disallow: /?blackhole

vir-gomez commented 4 years ago

Hi @chandlerbing26

You can do either of the following:

  1. Add a robots.txt file to the root of your website anyway; then you'll have complete control over its contents. This is probably your best bet, but it does not translate well with WordPress Multisite's domain mapping (a corner case).
  2. Add to or overwrite our filters as you described. Yes, that can be added to the functions.php file. See https://tsf.fyi/docs/filters#where to learn about alternative methods.

If we have the dynamically created by The SEO Framework plugin robots.txt and another one created manually by us, what of them should we add to Google/Bing Webmaster Tools?

sybrew commented 4 years ago

Hi @vir-gomez,

When there's a static robots.txt file in the root folder of your website, the virtual "file" cannot be outputted. So, with a robots.txt file present, The SEO Framework's output won't work.

The virtual robots.txt "file" will look a bit like this.

Now, the robots.txt file may just as well be empty because there are many other signals utilized to steer robots away from administrative and duplicated pages. Like the X-Robots-Tag HTTP header and the <meta name=robots /> HTML tag. So, feel free to use a custom robots.txt file with the blackhole directive in place.

P.S. Please send us future requests via our WordPress.org support forums. This issue is about a feature proposal, not a support topic.

sybrew commented 11 months ago

From mouste63's request:

Add a rule to block GPTBot from scraping.

sybrew commented 11 months ago

From https://github.com/sybrew/the-seo-framework/issues/647: Add more directives for AI-blocking, including opt-out for "Google-Extended" -- see https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#user-agents-in-robots.txt.