Closed sreeramjayan closed 2 years ago
Hi,
In version 5.0 we introduced a different strategy to fetch data from Redis, which tries to protect the Redis DB from growing too much causing flag definitions to be evicted. The sending of data no longer relies on periodic tasks. Instead, the whole process has now been built as a buffered pipeline that separates fetching from Redis, from processing and publishing. The pipeline will always be working while there's data in Redis. It will only wait for short period of times when the impressions/events queues are fully drained.
As a result of these changes, the Synchronizer will be able to process impressions and events at a much higher capacity than previous versions, but there is not a direct equivalency between the settings for impressions and events processing for versions prior to 5.0.0 and after 5.0.0.
Regarding pipeline settings, the best approach is to start with default values and do any fine-tuning after taking some measurements. The default values should work out of the box in most scenarios. Just keep in mind that we're aiming for 32gb of ram ideally to handle traffic spikes properly. If you absolutely must use instances with less memory and are expecting abrupt traffic spikes you might want set the SPLIT_SYNC_EVENTS_PROCESS_BATCH_SIZE <= 5000 (default 10k) and SPLIT_SYNC_EVENTS_POST_CONCURRENCY <= 1000 (default 2k).
Thank you @agustinona for the response.
Just keep in mind that we're aiming for 32gb of ram ideally to handle traffic spikes properly.
Is 32GB RAM required for the Redis instances?
@sreeramjayan
No, this requirement is for the Synchronizer instance.
@agustinona Are there any benchmarks on what the synchronizer can handle with 32 GB? Our instances had 1GB of memory with the 2.4.1 version of the split synchronizer and it seems to be working well with the same memory but the 5.0.2 image. Having benchmarks will help me determine how much memory needs to be added.
@sreeramjayan
The 32gb recommendation is derived from extensive stress-testing while developing the new version.
Data generated by our SDKs is now consumed by the sync by means of a local pipeline consisting of fetching, processing & posting steps with intermediate buffers. This approach maximizes Redis throughput and allows large-scale customers to evict their data in an optimal way. The goal was to achieve 500k impressions per second with headroom for sudden spikes, hence the memory requirements. The requirements can be relaxed by reducing SPLIT_SYNC_EVENTS_PROCESS_BATCH_SIZE
and SPLIT_SYNC_EVENTS_POST_CONCURRENCY
since they determine the size of the buffers. Although we don't yet have a formula to estimate memory usage based on those config options
Thank you for helping out @agustinona .
We recently updated the synchronizer from 2.4.1 to 5.0.2 and noticed that many environment variables have changed.
Can someone please verify if the environment variables used in 2.4.1 are equivalent to the ones I have in 5.0.2?