Prometheus connector duration cannot be set from configuration

linkous8 commented 3 years ago

Prom connector relies on control as its only source of timing configuration: https://github.com/opsani/servox/blob/b4fe0f84ea069a541698dfcb8e6deaf170580da5/servo/connectors/prometheus.py#L281

blakewatters commented 3 years ago

What's your desire here? I have a config option on the Vegeta Connector for setting the duration but was thinking about removing it as it is deferential to whatever comes down from the control fields in the server config.

If this is just about testing in an ad-hoc fashion, you can directly trigger a measurement against the Prometheus Connector via the CLI and specify the duration:

❯ poetry run servo measure --help
Usage: servo measure [OPTIONS] [METRICS]...

  Capture measurements for one or more metrics

Options:
  [METRICS]...                    Metrics to measure
  -c, --connectors [CONNECTORS]...
                                  Connectors to measure from
  -d, --duration DURATION         Duration of the measurement  [default: 0]
  -v, --verbose                   Display verbose output  [default: False]
  --humanize / --no-humanize      Display human readable output for units
                                  [default: True]

  --help                          Show this message and exit.

There are also some integration testing bits that are unmerged for talking to a real Prometheus backend from the test suite that I can turn you on to if that's helpful. I have also been working on a robust testing server stand-in for the backend to use under tests to speed up client development when we don't really care about what the backend is doing but want to run the full stack without relying on a ton of mocking.

If you are trying to override stuff in a real interaction with OCO-e, it probably requires an expanded conversation. It really wants the servo to be deferential to the control values and may fail out the operations if we try to subvert it.

linkous8 commented 3 years ago

It really wants the servo to be deferential to the control values and may fail out the operations if we try to subvert it.

In that case, I agree with your sentiment on removing the config option from the vegeta connector so that control becomes the only source of truth for measure timing.

This is about testing (and faster turnaround thereof) but my desire is more end-to-end than adhoc since running against the real optimizer often highlighted issues missed by unit testing in the past

linkous8 commented 3 years ago

Here's my test setup for reference (example_servo.yaml is the config, not manifest): https://github.com/opsani/servox/tree/feature/argo-rollouts/example_app

I run it on a local minikube and substitute the vegeta/prom urls for the URLs produced by:

minikube service web
minikube service -n opsani-monitoring prometheus

Its using NodePorts to do so, so I pinned it to a single replica to cut out networking headaches

blakewatters commented 3 years ago

I have removed the duration option from the public API on Vegeta Configuration on master to clean that up.

On the testing front, I totally hear you. I try to cover my logic and client-side bits as much as possible with unit and functional tests that are isolated from the optimizer.

I then use a thinner layer of integration tests (since they are so slow) that work against a real optimizer. I'm sure you haven't seen this stuff yet because I'm still pulling it all together and it isn't documented yet, but there are some bits on tree to support testing against a real optimizer.

In particular, I have some pytest fixtures that will build your working copy into a container image, push it into your local minikube registry, bring minikube up, and then return control to your tests. I'm using the bleeding edge Docker BuildX engine which has some new caching tricks so its decently fast (relatively speaking).

I have then been using kubetest to handle loading manifests to bring up my test app targets (you'll see some manifests in tests/manifests) and automate the setup. I also have a variation where it pushes into an ECR registry and tests against an EKS cluster when I need more firepower.

Getting all this testing infrastructure merged down and documented is a big priority for me but it's taking a while to get it where I want it because I'm really pushing out beyond what is easily available and off the shelf.

If you grep for pytest.mark.integration and dig around in conftest.py you can grab some of this stuff. Master is a little thin at the minute but I am on a war path to bring down my long-lived branches that are holding the rest of it.

opsani / servox

Prometheus connector duration cannot be set from configuration #82