Supports the 1996 RFC, as well as some modern conventions, including:
*
and $
)This library deals in UTF-8-encoded strings.
A path may match multiple directives. For example, /some/path/page.html
matches all
of these rules:
Allow: /some/
Disallow: /some/path/
Allow: /*/page.html
Each directive is given a priority, and the highest-priority matching directive is used. We choose the length of the expression to be that priority. In the above example, the priorities are:
Allow: /some/ (priority = 6)
Disallow: /some/path/ (priority = 11)
Allow: /*/page.html (priority = 12)
A Robots
object is the result of parsing a single robots.txt
file. It has a mapping of
agent names to Agent
objects, as well as a vector of the sitemaps
listed in the file.
An Agent
object holds the crawl-delay and Directive
s associated with a particular
user-agent.
Here's an example of parsing a robots.txt file:
#include "robots.h"
std::string content = "...";
Rep::Robots robots = Rep::Robots(content);
// Is this path allowed to the provided agent?
robots.allowed("/some/path", "my-agent");
// Is this URL allowed to the provided agent?
robots.url_allowed("http://example.com/some/path", "my-agent");
If a client is interested only in the exclusion rules of a single agent, then:
Rep::Agent agent = Rep::Robots(content).agent("my-agent");
// Is this path allowed to this agent?
agent.allowed("/some/path");
// Is this URL allowed to this agent?
agent.url_allowed("http://example.com/some/path");
This library depends on url-cpp
, which is included as a submodule. We provide two
main targets, {debug,release}/librep.o
:
git submodule update --init --recursive
make release/librep.o
To launch the vagrant
image, we only need to
vagrant up
(though you may have to provide a --provider
flag):
vagrant up
With a running vagrant
instance, you can log in and run tests:
vagrant ssh
cd /vagrant
make test
Tests are run with the top-level Makefile
:
make test
These are not all hard-and-fast rules, but in general PRs have the following expectations:
PR reviews consider the design, organization, and functionality of the submitted code.
Certain types of changes should be made in their own commits to improve readability. When too many different types of changes happen simultaneous to a single commit, the purpose of each change is muddled. By giving each commit a single logical purpose, it is implicitly clear why changes in that commit took place.
bundle update
or berks update
.Small new features (where small refers to the size and complexity of the change, not the impact) are often introduced in a single commit. Larger features or components might be built up piecewise, with each commit containing a single part of it (and its corresponding tests).
In general, bug fixes should come in two-commit pairs: a commit adding a failing test demonstrating the bug, and a commit making that failing test pass.
Whenever the version included in setup.py
is changed (and it should be changed when
appropriate using http://semver.org/), a corresponding tag should
be created with the same version number (formatted v<version>
).
git tag -a v0.1.0 -m 'Version 0.1.0
This release contains an initial working version Rep::Robots.'
git push origin