While current "queue-like system" on top of clickhouse worked quite well for testing it's no near as good as required for any serious high-volume use
Recently I did some testing on a beefy AWS hardware and fixed some internal bottlenecks(not yet merged) and in some testing scenarios where I could temporary alleviate the last left bottleneck - job distribution(writing new/updating completed/selecting), Crusty was capable of doing over 900MiB/sec - a whooping 7+gbit/sec! on 48 core(96 logical) c5.metal with a 25gbit/s port
using correct underlying data types(mostly sets and bloom filter for history) + batching and pipelining we can have solid throughput, low cpu usage per redis node, decent reliability and scalability
careful expiration could help to avoid memory overflow on redis node - we always discover domains faster than we can process them
While current "queue-like system" on top of clickhouse worked quite well for testing it's no near as good as required for any serious high-volume use
Recently I did some testing on a beefy AWS hardware and fixed some internal bottlenecks(not yet merged) and in some testing scenarios where I could temporary alleviate the last left bottleneck - job distribution(writing new/updating completed/selecting),
Crusty
was capable of doing over 900MiB/sec - a whooping 7+gbit/sec! on 48 core(96 logical) c5.metal with a 25gbit/s portNew job queue should be solely redis-based using redis modules: https://redis.io/topics/modules-intro rust has good enough library to allow writing redis module logic: https://github.com/RedisLabsModules/redismodule-rs
We will use pre-sharded queue(based on
addr_key
)Atomic operations:
using correct underlying data types(mostly sets and bloom filter for history) + batching and pipelining we can have solid throughput, low cpu usage per redis node, decent reliability and scalability careful expiration could help to avoid memory overflow on redis node - we always discover domains faster than we can process them