seomoz / rep-cpp

Robot exclusion protocol in C++
MIT License
12 stars 5 forks source link

Incorrectly handling leading wildcard #34

Open b4hand opened 6 years ago

b4hand commented 6 years ago

For the given robots.txt file:

User-Agent: *
Disallow: */test

The path /test should not be allowed.

b4hand commented 6 years ago

I've done some investigation on this. It's not happening at the Directive level but at the Agent level where we normalize the paths as URLs. The string */test is being transformed into /*/test before being handed to the Directive object and obviously /*/test doesn't match /test.

panthony commented 5 years ago

@b4hand Hello 👋

I'm currently hitting this issue and this is kind of blocker for me, do you know why it has not been fixed after a year?

I could give it a shot but I did not do CPP since school and I was wondering if you hit something harder to fix than anticipated or just let it go.