For sqlite only DATABASE_HOST is used, and it should begin with a '/'
All other DATABASE_* settings are required for mysql
DEBUG mode causes the crawler to output some stats that are generated as it goes, and other debug messages
LOGGING is a dictConfig dictionary to log output to the console and a rotating file, and works out-of-the-box, but can be modified
Current State
mysql engine untested
Issue in some situations where the database is locked and queries cannot execute. Presumably an issue only with sqlite's file-based approach
Logging
DEBUG+ level messages are logged to the console, and INFO+ level messages are logged to a file.
By default, the file for logging uses a TimedRotatingFileHandler that rolls over at midnight
Setting DEBUG in the settings toggles wether or not DEBUG level messages are output at all
Setting USE_COLORS in the settings toggles whether or not messages output to the console use colors depending on the level.
Misc
Designed to be able to run on multiple machines and work together to collect info in central DB
Queues links into the database to be crawled. This means that any machine running the crawler with the central db can grab from the same queue. Reduces crawling redundancy.
Thread pool apprach to analyzing keywords in text.