This is a skeleton for an idea I've had recently. I'm fully expecting this to require revisions and expansion before its production ready so please feel free to propose changes.
An alternate solution I was considering was advertising a plain CRAWLER token and then bots can detect that execute a CRAWLER <name> command and get back a response about whether that specific crawler is allowed on the network. I'm not sure if that overengineering things though.
Problem
Its very hard to find IRC channels because there's no useful comprehensive database of channels. A few exist (i.e. netsplit) but they rely on admins manually adding them which isn't great.
Its possible to crawl the entire address space for networks (and IRCStats currently does this) to collect data but many IRC admins have historically resisted making that information public for privacy reasons.
Solution
This specification adds a way for networks to declare that they are okay with bots crawling them. It also allows them to specify how often they'd like to be crawled. This allows networks with privacy concerns to opt-out of scanning.
I've put a WIP module with support for this on the InspIRCd Testnet (testnet.inspircd.org).
Rendered link.
This is a skeleton for an idea I've had recently. I'm fully expecting this to require revisions and expansion before its production ready so please feel free to propose changes.
An alternate solution I was considering was advertising a plain
CRAWLER
token and then bots can detect that execute aCRAWLER <name>
command and get back a response about whether that specific crawler is allowed on the network. I'm not sure if that overengineering things though.Problem
Its very hard to find IRC channels because there's no useful comprehensive database of channels. A few exist (i.e. netsplit) but they rely on admins manually adding them which isn't great.
Its possible to crawl the entire address space for networks (and IRCStats currently does this) to collect data but many IRC admins have historically resisted making that information public for privacy reasons.
Solution
This specification adds a way for networks to declare that they are okay with bots crawling them. It also allows them to specify how often they'd like to be crawled. This allows networks with privacy concerns to opt-out of scanning.
I've put a WIP module with support for this on the InspIRCd Testnet (testnet.inspircd.org).