After a while debugging, I've discovered that providing an URL without a trailing slash (https://example.com and not https://example.com/) fails certain checks, notable the robots.txt mayIndex() check.
This makes sense, because if there's no path returned when the URL is parsed, but if there is a Disallow: blank rule in the robots.txt file (which a lot do have) it will match an empty string with a blank path and mayIndex() will respond false.
3 possible fixes:
Update docs to be more clear. A simple note to say that a trailing slash is required for bare domains.
Add a slash if a bare URL is provided without one.
When looping through the URLs to check against (line 49 in RobotsTxt.php) check if the left side is an empty string and ignore it.
because this issue seems to be inactive for quite some time now, I've automatically closed it. If you feel this issue deserves some attention from my human colleagues feel free to reopen it.
After a while debugging, I've discovered that providing an URL without a trailing slash (
https://example.com
and nothttps://example.com/
) fails certain checks, notable the robots.txtmayIndex()
check.This makes sense, because if there's no path returned when the URL is parsed, but if there is a
Disallow:
blank rule in the robots.txt file (which a lot do have) it will match an empty string with a blank path andmayIndex()
will respond false.3 possible fixes:
RobotsTxt.php
) check if the left side is an empty string and ignore it.