pluginkollektiv / cachify

Smart but efficient cache solution for WordPress. Use DB, HDD, APC or Memcached for storing your blog pages. Make WordPress faster!
https://wordpress.org/plugins/cachify/
GNU General Public License v2.0
101 stars 32 forks source link

Pagespeed Insights reports "robots.txt is invalid" - The line "Disallow: /wp-content/cache/cachify/" is missing the user agent. Error: "No user-agent specified". #282

Closed ginocremer closed 10 months ago

ginocremer commented 1 year ago

Hi Cachify team, Cachify writes a line in robots.txt since 2.1.9:

Disallow: /wp-content/cache/cachify/

However, Pagespeed Insights now criticises (and deducts points accordingly) that "no user-agent was specified".

How can this be corrected accordingly? We have only seen that the line is written in the plugin code. But indeed without a user agent.

$data .= sprintf(
                        '%2$sDisallow: %1$s/wp-content/cache/cachify/%2$s',
                        ( empty( $url_parts['path'] ) ? '' : $url_parts['path'] ),
                        PHP_EOL
                );

Thanks for your feedback and kind regards.

I think it could also be something in combination with Yoast SEO.

If I open Yoast SEO Robots.txt editing tool and I click on save....the line from Cachify disappears:

# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow:

Sitemap: https://www.domain.be/sitemap_index.xml
# ---------------------------
# END YOAST BLOCK

Now I can directly write the Cachify-Line in robots.txt - and the error in pagespeed insights disappears also:

# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow:

User-agent: *
Disallow: /wp-content/cache/cachify/

Sitemap: https://www.domain.be/sitemap_index.xml
# ---------------------------
# END YOAST BLOCK

But I would like to have rather a solution that works directly without editing manually the robots.txt file via Yoast SEO.

stklcode commented 1 year ago

I just tried with an unmodified WP site, only Cachify enabled. This results in a generated robots.txt like this:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: http://10.100.2.1/wp-sitemap.xml
Disallow: */cache/cachify/

(the */cache/... is from current Cachify development version, previously it was /wp-content/cache/...)

Up to WordPress 5.4 (without the built-in sitemap feature), I guess the result was valid:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: */cache/cachify/

The sitemap is hooked in with priority 0 [1], our integration with default 10: https://github.com/pluginkollektiv/cachify/blob/aa835ce63d87b937c4eb26e5cc1dfe40d7a072be/inc/class-cachify.php#L174

Theoretically we can go to -1 or even lower, s.t. the line is printed before the sitemap, so it's a valid block again. But in general we don't know, what any other component might have contributed to robots.txt, so maybe it's better to always generate a complete block like

User-agent: *
Disallow: */cache/cachify/

[1] https://github.com/WordPress/wordpress-develop/blob/0cb8475c0d07d23893b1d73d755eda5f12024585/src/wp-includes/sitemaps/class-wp-sitemaps.php#L79

2ndkauboy commented 1 year ago

That's probably the best idea, always adding the User-agent to our rules.

stklcode commented 1 year ago

Should be fixed by #283