spatie / robots-txt

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
https://spatie.be/en/opensource/php
MIT License
219 stars 36 forks source link

Fixes "case-sensitive" URI matching for Disallow rules in robots.txt #46

Closed mattfo0 closed 2 weeks ago

mattfo0 commented 3 weeks ago

Based on Issue #45 (Robots.txt "Disallow" URI matching should be case-sensitive) I removed the use of strtolower in parseDisallow to preserve the URI's case sensitivity.

The issue was opened based on RFC standard by google which indicates: The value of the disallow rule is case-sensitive. (Source: https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt?hl=en#disallow)


I ran PHP-Unit and all tests passed since none were specifically testing case-sensitivity. I added test the_disallows_uri_check_is_case_sensitive to cover this issue.

riasvdv commented 2 weeks ago

Thanks!