Closed GoogleCodeExporter closed 9 years ago
if you change SimpleRobotRules.java
protected class RobotRule implements Comparable<RobotRule>{
String _prefix;
Pattern _pattern;
boolean _allow;
public RobotRule(String prefix, boolean allow) {
_prefix = prefix;
_pattern = null;
_allow = allow;
}
public RobotRule(Pattern pattern, boolean allow) {
_prefix = null;
_pattern = pattern;
_allow = allow;
}
@Override
public int compareTo(RobotRule o) {
if(this._allow == o._allow)
return 0;
else if(this._allow && !o._allow)
return 1;
else
return -1;
}
}
And change Rule collections
private TreeSet<RobotRule> _rules
this problems be fixed.
Dissalow height prioritet and this Rule go first.
Original comment by y.vladim...@semrush.com
on 24 Oct 2013 at 12:01
The above changes would put allow rules before disallow rules, but the Google
implementation has an additional condition, where the "allow before disallow"
heuristic is only triggered if the allow pattern has equal or more characters
in the path when compared to a disallow path.
So if my allow rule was /dir and there was also a disallow rule with
/dir/subdir, and both matched, then the disallow would win.
Original comment by kkrugler...@transpac.com
on 25 Oct 2013 at 12:20
Hmm, so if we sorted by prefix length first (longer goes first), and then by
allow before disallow, I think we'd mostly get the implementation right.
Original comment by kkrugler...@transpac.com
on 25 Oct 2013 at 12:33
Ok
@Override
public int compareTo(RobotRule o) {
if(_prefix.length()>o._prefix.length())
return 1;
else if(_prefix.length()<o._prefix.length())
return -1;
else if(this._allow == o._allow)
return 0;
else if(this._allow && !o._allow)
return 1;
else
return -1;
}
Original comment by y.vladim...@semrush.com
on 25 Oct 2013 at 12:12
What would be great is a test that tries out the examples at the end of
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
(with both orders of the allow/disallow rules) to validate whether the above
would actually work.
Original comment by kkrugler...@transpac.com
on 26 Oct 2013 at 11:16
Rolled in change as per y.vladimirov in r116.
Original comment by kkrugler...@transpac.com
on 14 Mar 2014 at 12:03
Original issue reported on code.google.com by
kkrugler...@transpac.com
on 17 Mar 2013 at 6:41