Environment variables comparison between versions

sreeramjayan commented 2 years ago

We recently updated the synchronizer from 2.4.1 to 5.0.2 and noticed that many environment variables have changed.

Can someone please verify if the environment variables used in 2.4.1 are equivalent to the ones I have in 5.0.2?

2.4.1	5.0.2
SPLIT_SYNC_IMPRESSIONS_POST_RATE	SPLIT_SYNC_IMPRESSIONS_POST_SIZE
SPLIT_SYNC_IMPRESSIONS_PER_POST	SPLIT_SYNC_IMPRESSIONS_PROCESS_BATCH_SIZE
SPLIT_SYNC_IMPRESSIONS_THREADS	SPLIT_SYNC_IMPRESSIONS_PROCESS_CONCURRENCY
SPLIT_SYNC_EVENTS_THREADS	SPLIT_SYNC_EVENTS_PROCESS_CONCURRENCY

agustinona commented 2 years ago

Hi,

In version 5.0 we introduced a different strategy to fetch data from Redis, which tries to protect the Redis DB from growing too much causing flag definitions to be evicted. The sending of data no longer relies on periodic tasks. Instead, the whole process has now been built as a buffered pipeline that separates fetching from Redis, from processing and publishing. The pipeline will always be working while there's data in Redis. It will only wait for short period of times when the impressions/events queues are fully drained.

SPLIT_SYNC_IMPRESSIONS_PROCESS_CONCURRENCY is used to set how many threads should be processing data fetched from Redis before it is sent to our servers. It's default value is set at runtime based on the CPU cores available.
SPLIT_SYNC_IMPRESSIONS_ACCUM_WAIT_MS defines how long to wait for a batch to reach a reasonable size before processing it and sending it to our servers. By default, the fetching step in the pipeline will wait up to 5 seconds or until 10k (impressions-process-batch-size) impressions have been fetched. As soon as one of those conditions is met, the bulk is passed to the processing stage of the pipeline.

As a result of these changes, the Synchronizer will be able to process impressions and events at a much higher capacity than previous versions, but there is not a direct equivalency between the settings for impressions and events processing for versions prior to 5.0.0 and after 5.0.0.

Regarding pipeline settings, the best approach is to start with default values and do any fine-tuning after taking some measurements. The default values should work out of the box in most scenarios. Just keep in mind that we're aiming for 32gb of ram ideally to handle traffic spikes properly. If you absolutely must use instances with less memory and are expecting abrupt traffic spikes you might want set the SPLIT_SYNC_EVENTS_PROCESS_BATCH_SIZE <= 5000 (default 10k) and SPLIT_SYNC_EVENTS_POST_CONCURRENCY <= 1000 (default 2k).

sreeramjayan commented 2 years ago

Thank you @agustinona for the response.

Just keep in mind that we're aiming for 32gb of ram ideally to handle traffic spikes properly.

Is 32GB RAM required for the Redis instances?

agustinona commented 2 years ago

@sreeramjayan

No, this requirement is for the Synchronizer instance.

sreeramjayan commented 2 years ago

@agustinona Are there any benchmarks on what the synchronizer can handle with 32 GB? Our instances had 1GB of memory with the 2.4.1 version of the split synchronizer and it seems to be working well with the same memory but the 5.0.2 image. Having benchmarks will help me determine how much memory needs to be added.

agustinona commented 2 years ago

@sreeramjayan

The 32gb recommendation is derived from extensive stress-testing while developing the new version. Data generated by our SDKs is now consumed by the sync by means of a local pipeline consisting of fetching, processing & posting steps with intermediate buffers. This approach maximizes Redis throughput and allows large-scale customers to evict their data in an optimal way. The goal was to achieve 500k impressions per second with headroom for sudden spikes, hence the memory requirements. The requirements can be relaxed by reducing SPLIT_SYNC_EVENTS_PROCESS_BATCH_SIZE and SPLIT_SYNC_EVENTS_POST_CONCURRENCY since they determine the size of the buffers. Although we don't yet have a formula to estimate memory usage based on those config options

sreeramjayan commented 2 years ago

Thank you for helping out @agustinona .

splitio / split-synchronizer

Environment variables comparison between versions #175