Closed LeMoussel closed 7 years ago
Hi @LeMoussel
This library is for parsing the contents of a robots.txt
file into a model (an instance of webignition\RobotsTxt\File\File
which can then be examined programmatically as required.
The model exists at a lower level of abstraction than the context of your question and as such the model can't directly do what you want. The model has no understanding of the different types of directives (allow
, disallow
and so on), nor does it understand the values of the directives (*deny_all/$
, *deny_googlebot/$
).
You can certainly iterate over the set of directives for each given user agent and see if any directives equate to the conditions you're interested in if you're willing to examine the raw directive names and raw directive values.
Here is a (somewhat convoluted) example from a unit test I just created:
public function testFoo()
{
$source = <<<'EOD'
User-agent: *
Disallow: *deny_all/$
User-agent: Googlebot
Disallow: *deny_googlebot/$
EOD;
$parser = new \webignition\RobotsTxt\File\Parser();
$parser->setSource($source);
$robotsTxtFile = $parser->getFile();
$areAllUserAgentsDisallowedDenyAllPath = false;
$directivesForAllAgents = $robotsTxtFile->getDirectivesFor('*')->get();
foreach ($directivesForAllAgents as $directiveForAllUserAgents) {
/* @var $directiveForAllUserAgents \webignition\RobotsTxt\Directive\Directive */
$isDisallowDirective = $directiveForAllUserAgents->getField() === 'disallow';
$isDenyAllPath = false;
if ($isDisallowDirective) {
$isDenyAllPath = (string)$directiveForAllUserAgents->getValue() === '*deny_all/$';
}
if ($isDisallowDirective && $isDenyAllPath) {
$areAllUserAgentsDisallowedDenyAllPath = true;
}
}
$this->assertTrue($areAllUserAgentsDisallowedDenyAllPath);
}
Might have a solution to this soon, reopening ...
Resolved now in robots-txt-file which now supercedes this package.
For example with this robots.txt content
User-agent: Disallow: deny_all/$
User-agent: Googlebot Disallow: *deny_googlebot/$
How can I test if
http://mytestsite.com/deny_all/
&http://mytestsite.com/deny_googlebot/
is Allow/Disallow for all user agent ("*") or Googlebot user agent ("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)") ?