perftool-incubator / rickshaw

A project to facilitate execution of benchmarks and tools via extensions for many target environments
Apache License 2.0
0 stars 10 forks source link

question how to correctly run rickshaw #249

Closed alicefr closed 4 months ago

alicefr commented 2 years ago

Hi, I'm trying to run crucible together with KubeVirt VM (I have ssh access from the host). Do we have a working example for rickshaw with a remote host. The examples in the README don't work for me. The config files don't exist in the right path and it's been hard for me to figure out how to make everything running. Additionally, do we have any docs/demo that explains how to run the entire setup?

rafaelfolco commented 2 years ago
remhost1=<VM_HOSTNAME_OR_IP>
....
<truncated>

  --endpoint remotehost,host:$remhost1,user:root,client:1,userenv:alma8,cpu-partitioning:client-1:1 \
....
<truncated>

https://github.com/perftool-incubator/crucible-examples

Please let me know if any further questions. Feel free to ping me on slack.

--rfolco

alicefr commented 2 years ago

@rafaelfolco many thanks, I'd like just to verify my entire setup and run a simple example. Could you please suggest an example and how to run the scripts? I still don't quite understand what is the userenv? Ideally, I'd like to run some fio tests

rafaelfolco commented 2 years ago

@rafaelfolco many thanks, I'd like just to verify my entire setup and run a simple example. Could you please suggest an example and how to run the scripts? I still don't quite understand what is the userenv? Ideally, I'd like to run some fio tests

@alicefr Assuming you have successfully installed cruciible (see https://github.com/perftool-incubator/crucible/blob/master/INSTALL.md), you can check your setup as follows:

  1. Check if you have the crucible controller containers up and running:
    # podman ps -a
    CONTAINER ID  IMAGE                               COMMAND               CREATED      STATUS          PORTS       NAMES                                                                                            
    e2a88e3ff980  quay.io/crucible/controller:latest  /opt/crucible/bin...  5 weeks ago  Up 5 weeks ago              crucible-logger-02163e6e-81dc-40d8-a6a2-b838ddeb06c9                                             
    6c855b7ab2df  quay.io/crucible/controller:latest  redis-server /etc...  4 hours ago  Up 4 hours ago              crucible-redis                                                                                   
    a79c96dddbdf  quay.io/crucible/controller:latest  /usr/sbin/httpd -...  3 hours ago  Up 3 hours ago              crucible-httpd                                                                                   
    5ab54757f0b6  quay.io/crucible/controller:latest  /opt/crucible/con...  3 hours ago  Up 3 hours ago              crucible-es 
  2. Check if crucible config is present:
    # cat /etc/sysconfig/crucible                                                                                                                                                      
    CRUCIBLE_USE_CONTAINERS=1
    CRUCIBLE_USE_LOGGER=1
    CRUCIBLE_CONTAINER_IMAGE=quay.io/crucible/controller:latest
    CRUCIBLE_CLIENT_SERVER_REPO=quay.io/crucible/client-server
    CRUCIBLE_CLIENT_SERVER_AUTH="/root/auth-file.json"
    CRUCIBLE_HOME=/opt/crucible
  3. Type crucible and hit , you should see auto-completion:
    # crucible 
    help         log          repo         update       run          wrapper      console      start        get          rm           index        postprocess  es   

    This also appplied to sub-commands. For instance: crucible get <tab>.

  4. Run crucible start to be sure all services are running
    crucible start
  5. List available/supported benchmarks
    # crucible run
    cyclictest   uperf        tracer       trafficgen   oslat        flexran      fio          hwlatdetect
  6. Create a simple mv-params to run your benchmark. I'll be running oslat ion this example.
    # cat mv-params.json
    {
    "global-options": [
        {
            "name": "global",
            "params": [
                { "arg": "duration", "vals": [ "120" ], "role": "client" },
                { "arg": "rtprio", "vals": [ "1" ], "role": "client" },
                { "arg": "smt", "vals": [ "on" ], "role": "client" },
            ]
        }
    ],
    "sets": [
        {
            "include": "global",
            "params": [
            ]
        }
    ]
    }
  7. Call crucible run with the appropriate args or create a run.sh to make your run invocation easier and editable:
    
    # cat run.sh
    #!/bin/bash

csid=1 remote_host= tags="test:sniff,user:alice" endpoint_arg+=" --endpoint remotehost,host:${remote_host},user:root,client:${csid},cpu-partitioning:client-1:1"

crucible run oslat \ --tags ${tags} \ --num-samples 1 \ --mv-params mv-params.json \ ${endpoint_arg}


8. Make sure your remotehost is accessible via ssh passwordless:

ssh $remote_host

9. Run

./run.sh

10. The first time you run it takes longer as it builds the image layers

Preparing to run oslat Confirming the endpoints will satisfy the benchmark-client and benchmark-server requirements There will be 1 client(s) and 1 server(s) Building test execution order Preparing userenvs: Sourcing container image; this may take a few minutes Searching for existing stages (1 to 10, 10 being most complete) Found most complete stage (number 10) Processing stage 1 (7ab1d564adbf1a9fce88927ebe61191e)... Ready ...

... Processing stage 10 (c9a3d1f9f787931f1ca9bfdfb75f3a2a)... Ready ``` 11. You should see the following sequence in a good run: ``` Deploying endpoints endpoint-deploy-timeout adjusted to 480 seconds client-server-script-timeout adjusted to 480 seconds Roadblock: Fri Mar 11 14:34:48 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:endpoint-deploy Endpoint created followers: worker-1 Roadblock: Fri Mar 11 14:35:57 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-script-start Roadblock: Fri Mar 11 14:35:59 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-get-data Roadblock: Fri Mar 11 14:36:03 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-collect-sysinfo Roadblock: Fri Mar 11 14:36:11 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-start-tools Roadblock: Fri Mar 11 14:36:40 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-start-tests Running tests: Iteration 1 sample 1 (test 1 of 1) attempt number 1 Roadblock: Fri Mar 11 14:37:03 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:infra-start Roadblock: Fri Mar 11 14:37:09 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:server-start Roadblock: Fri Mar 11 14:37:14 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:endpoint-start Roadblock: Fri Mar 11 14:37:19 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:client-start found new timeout value: 480 Assigning new timeout with padding for next roadblock: 480 Roadblock: Fri Mar 11 14:37:32 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:client-stop Roadblock: Fri Mar 11 14:43:09 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:endpoint-stop Roadblock: Fri Mar 11 14:43:15 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:server-stop Sample 1 completed successfully with 0 failed attempts (0 total sample failures for this iteration) Roadblock: Fri Mar 11 14:43:22 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:1-1-1:infra-stop Roadblock: Fri Mar 11 14:44:02 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-stop-tests Roadblock: Fri Mar 11 14:44:09 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-stop-tools Roadblock: Fri Mar 11 14:44:13 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-send-data Roadblock: Fri Mar 11 14:45:29 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:client-server-script-stop Roadblock: Fri Mar 11 14:45:35 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:endpoint-move-data Roadblock: Fri Mar 11 14:45:38 UTC 2022 role: leader attempt number: 1 uuid: 1:e74216ec-c0c1-4e66-9d88-a1da01d96ef3:endpoint-finish ``` 12. At the end you should see the report summary, something like: ``` Generating benchmark summary report run-id: e74216ec-c0c1-4e66-9d88-a1da01d96ef3 tags: test=sniff user=alice metrics: source: procstat types: interrupts-sec source: oslat types: polling-latency-usec iterations: common params: duration=180 rtprio=1 smt=on iteration-id: FF0B6A48-A149-11EC-BBBA-F258863275F7 unique params: primary-period name: measurement samples: sample-id: FF0FE2B2-A149-11EC-BBBA-F258863275F7 primary period-id: FF105BFC-A149-11EC-BBBA-F258863275F7 period range: begin: 1647009454895 end: 1647009784030 result: (polling-latency-usec) samples: 134.00 mean: 134.00 min: 134.00 max: 134.00 stddev: NaN stddevpct: NaN ``` Note: Ignore the results, I just copied from a random test I did. 13.. To show the report summary again you can run: ``` crucible get result --run e74216ec-c0c1-4e66-9d88-a1da01d96ef3 ``` 14. Logs are saved at /var/lib/crucible/run. ``` # tail -n1 oslat--2022-03-*/run/result-summary.txt ==> oslat--2022-03-08_18:56:28_UTC--76543b1d-008d-4597-8852-60e7e827673e/run/result-summary.txt <== result: (polling-latency-usec) samples: 23.89 mean: 23.89 min: 23.89 max: 23.89 stddev: NaN stddevpct: NaN ==> oslat--2022-03-08_20:46:54_UTC--b854f5fb-6241-4d51-99a1-9f142b132115/run/result-summary.txt <== result: (polling-latency-usec) samples: 166.00 mean: 166.00 min: 166.00 max: 166.00 stddev: NaN stddevpct: NaN ``` 15. Use "crucible get metric" for more sophisticated queries: ``` crucible get metric --run 9343f0c4-911b-4512-8412-09383b65fb23 --period B5460C72-8440-11EC-AD59-3978773275F7 --source procstat --type interrupts-sec --breakout cstype=worker,csid=1,type,cpu=6,desc ``` Explore more examples like trafficgen at: https://github.com/perftool-incubator/crucible-examples/blob/main/trafficgen/README.md Hope this helps! That's All Folco's