The original pr was made and reviewed in this fork
What is done?
Removed the synthetic generation from the control node
Removed periodic synthetic query generation
Replaced fetching the synthetic data from Redis in the query node with on-demand generation
Each text prompt in a synth query is now unique
Images are cached in Redis. Now that cache is accessed a lot, disk read might become a bottleneck. It makes sense to keep them in memory. Since the query nodes are going to scale, we keep the cache in Redis so they can all access it
tested both with unit tests and by running the control and query node with postgres and redis. Lots of things were commented out and fake tasks are made in the control node. Verified that the Redis is used as intended by putting log statements in the get_random_image_b64 func.
What could be improved
Markov model is trained and loaded for each query node replica. This is due to me not realizing the query node should scale up. The markov model could be trained only once. We could also move the synthetic generation out of the query node so it can be scaled up/down independently. We could then use redis queues to pass the generated data to the query node
profiling the current generation would expose the bottlenecks
the masks are decoded and encoded each time. We could at least avoid decoding by caching the image size, but also encoding by caching the mask
Closes #55
The original pr was made and reviewed in this fork
What is done?
What could be improved