scalability question - Githubissues

TheHmadQureshi commented 5 years ago

Hello, I want to deploy elastiflow for my enterprise and expecting to receive 100,000 flows per second on elastiflow. I tested elastiflow by forwarding around 2,000/sec flows to my current environment which has Elastiflow and ELK on a single server and it worked perfectly. Now i want to distribute the processing power of logstash/kibana. Is there any documentation available which might help me?

Thanks! Hammad

z0lt3c commented 5 years ago

This is really a two part question.

Part 1 would involve standing up enough elastliflow instances to handle the volume of flow you require. I would do this by building 2500 flow per second elastiflow containers using docker and then I use docker-swarm to spawn as many containers, across as many hosts, as I need to support the target flow volume. In the case of 100K, you are talking 40 containers and if we go by Rob's recommendation of 12C/64G, be prepared to provide 480 cores and 2560GB of RAM. This would require roughly 5 or 6 modern servers (90C/512G). In addition, you would need a load balancer to sit in front of all the logstash listeners, in my case I am using nginx to balance the incoming flows across the backend pool using a client address-backend static map.

Part 2 would require sufficient scaling of the elasticsearch cluster to handle 100K events per second. In my experience building large elastic clusters, I would suggest designing this as at least 2 data nodes and 1 master node. There are a number of tunables you will need to optimize given an out of the box elasticsearch cluster install, but there are plenty of article online on how to scale this part. I've built elasticsearch clusters handling 1M events per second, so 100K is fairly trivial, however, you are probably looking at needing 72C and 256G, which is at least one more server, if not two.

Thus, at $15K per server you are looking around $100K alone in hardware resource costs or a massive monthly bill from AWS. You can draw your own judgement how well this solution really scales given that expense and if logstash is truely the right ingestion engine when considering scale. At this overall cost it would make sense to bring in a short term consultant who has experience scaling elastic, beyond just online documentation.

sgnsys3 commented 5 years ago

@z0lt3c What protocol of flow do you use?

z0lt3c commented 5 years ago

@sgnsys3 I have used Elastiflow for both Netflow and sFlow protocols. I've managed flows from Cisco IOS, Cisco NX-OS, Cisco XR, Arista, F5, Palo Alto and softflowd agents. Aside from dealing with the template ID issues, it has worked great.

robcowart commented 5 years ago

@z0lt3c provided a good answer here. I'm closing this.

robcowart / elastiflow

scalability question #278