taganaka / polipus

Polipus: distributed and scalable web-crawler framework
MIT License
92 stars 32 forks source link

[Question] Using MongoDB as 'cache' aside of Redis #65

Closed anawolak closed 9 years ago

anawolak commented 9 years ago

Hi. I actually have a question connected with excellent presentation from RubyDay 2013. I'd like to crawl really big sites (<= 100 M of pages) and I'm thinking about best approach. I'm giving as much memory as it's possible for Redis, specifying maxmemory and maxmemory-policy to noeviction, I've also enabled MongoDB base.

Does it works like I think it should, that when used Redis memory is close to maxmemory value then part of data is switched to Mongo? If not - how can it work on huge amount of pages? Thanks!

OK, I found 'queue_overflow_adapter' example. That should be it!