google / robotstxt

The repository contains Google's robots.txt parser and matcher as a C++ library (compliant to C++11).
Apache License 2.0
3.39k stars 232 forks source link

An encoding test does not appear to match the RFC? #64

Open davepeck opened 8 months ago

davepeck commented 8 months ago

The first ID_Encoding test caught me by surprise, since it does not appear to match the RFC:

  // /foo/bar?baz=http://foo.bar stays unencoded.
  {
    const absl::string_view robotstxt =
        "User-agent: FooBot\n"
        "Disallow: /\n"
        "Allow: /foo/bar?qux=taz&baz=http://foo.bar?tar&par\n";
    EXPECT_TRUE(IsUserAgentAllowed(
        robotstxt, "FooBot",
        "http://foo.bar/foo/bar?qux=taz&baz=http://foo.bar?tar&par"));
  }

However, section 2.2.2 of the REP RFC seems to indicate that /foo/bar?baz=http://foo.bar should be encoded as /foo/bar?baz=http%3A%2F%2Ffoo.bar.

I can't decide if I'm mis-reading the RFC or if the test intentionally deviates from the RFC in this case.

Thanks!