Open maurodoglio opened 11 years ago
maybe if we can't scan fast enough, we should make it so that scanning/aggregation can be done by expanding to multiple machines in parallel, all writing into (or having their databases replicating into) a central database that drives the UI.
A starting point could be the introduction of celery. We could split the scan->parse-aggregate flow in asynchronous tasks, in order to
System needs to be able to run through a 1000 sites in 4-8 hours to do both scraping and analysis