Closed brianfoody closed 7 years ago
We generally run an autoscaling group of r4.4xlarge instances, with one VCR process for each stream we are recording, under supervisor, which covers both process death and instance death fairly well. The actual requirements are definitely a function of the data volume and number of shards.
At Scopely, we have a Kinesis worker monitoring system that we wrote in-house that watches all of our Kinesis workers, including the VCRs, and alarms if any of them are falling behind-- it watches Cloudwatch metrics emitted by the KCL. That system has not yet been open-sourced.
For replay, we tend to just launch on a fairly powerful box manually and let it go; replay is quite rare for us but it hasn't really proven difficult.
Perfect, thanks @avram. Closing now.
Would it be possible to provide some information on the best way of running this in a production environment?
Information I feel is missing;