webignition / robots-txt-file

Models a robots.txt file
MIT License
18 stars 3 forks source link

Missing User Agent Gives Error #14

Open DaveChild opened 6 years ago

DaveChild commented 6 years ago

I had a robots.txt file to process which included the following line, which caused a fatal error:

User-agent:

No user agent was specified, and the robots.txt parser errored when checking a URL against that robots.txt file.

webignition commented 6 years ago

A user agent directive line lacking a user agent string (such as User-agent:) is invalid but certainly not something that should cause the parser to fail hard.

I'll look into a fix.

webignition commented 6 years ago

I can't replicate using content that contains an empty user agent directive. In a test I added:

$source = <<<EOF
User-agent:
Allow: /

EOF;

$parser = new Parser();
$parser->setSource($source);

$file = $parser->getFile();

echo $file;

// This echos:
// user-agent:*
// allow: /

I'll need suitable information to reproduce the issue. Can you reduce this down to a failing test case?

DaveChild commented 6 years ago

I think it was something like:

User-agent: baiduspider
Disallow: /

User-agent:

User-agent: *
Allow: /blah/
Allow: /foo/
Allow: /bar/

But I'll dig out the original bug and get a test case together,

webignition commented 6 years ago

Cheers, that would certainly help.

I tried also the above-suggested robots.txt content and that also failed to produce any errors.

webignition commented 6 years ago

@DaveChild Is this still an issue? Have you been able to put together any means for replicating?