sjdirect / nrobots

The Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. This project provides an easy-to-use class, implemented in C#, to work with robots.txt files.
Microsoft Public License
15 stars 9 forks source link

Slash in robot txt is misinterpreted. #10

Open sjdirect opened 5 years ago

sjdirect commented 5 years ago

The two lines below are included in robots.txt: Disallow: /om/work-at-abc/lediga-jobb/ Disallow: /om/work-at-abc/lediga-jobb?

Consequently, this page is incorrectly disallowed: https://www.abc.se/om/work-at-abc/lediga-jobb

Log: [https://www.abc.se/om/work-at-abc/lediga-jobb] not crawled, [Disallowed by robots.txt file]

Kind Regards, Ola

sjdirect commented 5 years ago

Moved from https://github.com/sjdirect/abot/issues/186