webignition / robots-txt-file

Models a robots.txt file
MIT License
18 stars 3 forks source link

Not Valid directive mistake #8

Closed LeMoussel closed 7 years ago

LeMoussel commented 7 years ago

with this robots.txt content

User-agent: *
Disallow:

And this code, I got not valid directive for "Disallow". I think this is an error because this directive is correct. Rem: this sites indicates that is valid robots.txt file with no Warning or Fail error. SeoBook Robots.txt Analyser SeoSiteCheckup Robots.txt

$parser = new \webignition\RobotsTxt\File\Parser();
$parser->setContent(file_get_contents('http://f4b1.com/robots.txt'));
$robotsTxtFile = $parser->getFile();
foreach($robotsTxtFile->GetAllDirectives() as list($field, $value, $valid)) {
            if ($valid === FALSE) { echo "Invalid directive $field => $value"; }
}
webignition commented 7 years ago

@LeMoussel Thanks for pointing this out.

What you say sounds right (that this is a bug). I'll quickly review the robots txt spec just to be sure and then I'll look into a fix.

webignition commented 7 years ago

Yep, according to the spec:

Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record.

webignition commented 7 years ago

Fixed in https://github.com/webignition/robots-txt-file/releases/tag/0.3