webignition / robots-txt-file

Models a robots.txt file
MIT License
18 stars 3 forks source link

Can getDirectivesFor() be case insensitive? #4

Closed halfer closed 9 years ago

halfer commented 9 years ago

I've been scratching my head over this problem for a few hours! I tried this code initially:

    return $this->
        getRobotFile()->
        getDirectivesFor('BadBot')->
        filter(array('field' => 'disallow'))->
        get();

It would pick up the * rules but not the BadBot-specific rules. However, this works:

    return $this->
        getRobotFile()->
        getDirectivesFor('badbot')->
        filter(array('field' => 'disallow'))->
        get();

Now the docs say the UA string is case insensitive, but I'm definitely getting different results:

Notice how the user agent string is case insensitive?

My test robots file:

User-agent: *
Disallow: /directory/file.html
Disallow: /directory/file2.html

User-agent: BadBot
Disallow: /

I can use strtolower() for now, of course, but I imagine this would be better in the library. Have I made a mistake somewhere, or is this really case insensitive here?

PHP 5.5, Ubuntu 13.10. My hacky testing was inside a project, but I can try creating a standalone script, if that's helpful.

webignition commented 9 years ago

Good question. I'll look into that.

webignition commented 9 years ago

Sounds fair that the user agent string should be case insensitive in this instance.

Fixed in release 0.2

halfer commented 9 years ago

Many thanks Jon, I look forward to giving this a try.

webignition commented 9 years ago

No problem.

Out of interest, in what projects are you using this package?

halfer commented 9 years ago

I'm working on a crawler that allows data to be fetched using a site-specific set of fetch/processing commands edited in a web interface, and which hopefully will allow me to create structured datasets from a wide range of semi-structured pages. I don't know if it is a viable project yet :-) but hope to have a prototype running in the next couple of weeks.

webignition commented 9 years ago

Awesome!