Open ryanhoque opened 6 years ago
@ryanhoque If you need help on the metric part, feel free to ping me. We can make a public Grafana dashboard and show everyone the metrics.
For example, wikipedia has great public metric dashboard: https://grafana.wikimedia.org/dashboard/db/performance-metrics?refresh=5m&orgId=1
@simon-mo That'd be great!
What is the current status of this? @ryanhoque were you able to get this finished before the end of the semester?
@dcrankshaw I'm still having an issue with specifying an external redis cluster and I have to coordinate with Simon to finish up metrics, but it shouldn't take more than a few hours.
@simon-mo Is this done before? Seems like you guys tried to setup the stress test.
Having a long-running Clipper cluster under an active workload will serve as a stress-test for the system. More specifically, we will deploy Clipper on Kubernetes with Redis configured to run in fault-tolerant mode and query Clipper's REST interface. We will be training the first place model from the 2014 display advertising Kaggle competition with a dataset from Criteo and deploy it to Clipper every few hours. Later the model training pipeline can be generalized to arbitrary models.
Repo: https://github.com/ucbrise/clipper-serving-testbed Design Doc: https://docs.google.com/document/d/13HZvSnTj6trosyv4SenoHLj9fcoGPdzgGBOff14arTw/edit?usp=sharing