IDEA: alternative backends

I think a storage interface could be useful. Redis would work nicely for small datasets. For large crawls I've had good luck using sinew and the parallel gem to speed things along. My most recent crawl produced a 30gb cache :)

One wrinkle is cache expiration. With Redis and Memcache, you setup cache expiration as each key is written. Like set(key, value, 86400). With httpdisk on the other hand, cache entries can be expired at any time. For example, you might decide to recrawl and discard pages that are more than an hour old. Or maybe three days old.

httpdisk uses File.mtime to figure out if a value should be discarded. With other cache stores you'd have to store the key creation time to achieve the same functionality.

gurgeous / httpdisk

IDEA: alternative backends #7