Open QueJJ opened 17 hours ago
In the code, pod_num
represents just one set of training data. Your initial step should be to fit cdf of the tail latency for each microservice under varying loads (requests/containers). Following that, establish a linear relationship between lambda and load. The parameters obtained from these trained models will be directly applied to the allocation iteration process.
The search space is determined by serveral factors. Such as, init SLA range, the number of pods, and the permissible error margin for iteration, etc. We have already provided a set of trained parameters based on Trainticket. However, please note that performance can vary across different clusters, so the training outcomes may not be universally applicable to your specific case.
I'm currently working with the media-microsvc environment in DeathStarBench. I want to know whether the approach is as follows: fix a set of initial pod numbers --> collecting data under different workloads --> train a model and get a resource allocation --> repeat the process to find the allocation that uses the least resources.
If that is the case, can you tell me the size of the search space and how many initial points are required in the process?
Additionally, if it is possible, could you share the final model or the data used for resource allocation?
Thanks for your help!