spdk / spdk

Storage Performance Development Kit
https://spdk.io/
Other
3.05k stars 1.19k forks source link

bdevperf's -z needs much better documentation #3227

Closed cfourteen closed 8 months ago

cfourteen commented 9 months ago

If you use --wait-for-rpc(BTW: this flag is called -z in the --help spew which is wrong) you cannot give bdevperf a JSON config file via -c option. On the rpc.py side you cannot attach a controller until the init of the framework is done so you send _framework_startinit which then causes bdevperf to start ...but w/out any bdev devices since you could not create them before you sent _framework_startinit it exits immediately. Was this ever tested at all?

mikeBashStuff commented 9 months ago
# ./build/examples/bdevperf --help
./build/examples/bdevperf [options]
options:
 -c, --config <config>     JSON config file (default none)
     --json <config>       JSON config file (default none)
     --json-ignore-init-errors
                           don't exit on invalid config entry
 -d, --limit-coredump      do not set max coredump size to RLIM_INFINITY
 -g, --single-file-segments
                           force creating just one hugetlbfs file
 -h, --help                show this usage
 -i, --shm-id <id>         shared memory ID (optional)
 -m, --cpumask <mask or list>    core mask (like 0xF) or core list of '[]' embraced (like [0,1,10]) for DPDK
     --lcores <list>       lcore to CPU mapping list. The list is in the format:
                           <lcores[@CPUs]>[<,lcores[@CPUs]>...]
                           lcores and cpus list are grouped by '(' and ')', e.g '--lcores "(5-7)@(10-12)"'
                           Within the group, '-' is used for range separator,
                           ',' is used for single number separator.
                           '( )' can be omitted for single element group,
                           '@' can be omitted if cpus and lcores have the same value
 -n, --mem-channels <num>  channel number of memory channels used for DPDK
 -p, --main-core <id>      main (primary) core for DPDK
 -r, --rpc-socket <path>   RPC listen address (default /var/tmp/spdk.sock)
 -s, --mem-size <size>     memory size in MB for DPDK (default: 0MB)
     --disable-cpumask-locks    Disable CPU core lock files.
     --silence-noticelog   disable notice level logging to stderr
     --msg-mempool-size <size>  global message memory pool size in count (default: 262143)
 -u, --no-pci              disable PCI access
     --wait-for-rpc        wait for RPCs to initialize subsystems
     --max-delay <num>     maximum reactor delay (in microseconds)
 -B, --pci-blocked <bdf>   pci addr to block (can be used more than once)
 -A, --pci-allowed <bdf>   pci addr to allow (-B and -A cannot be used at the same time)
 -R, --huge-unlink         unlink huge files after initialization
 -v, --version             print SPDK version
     --huge-dir <path>     use a specific hugetlbfs mount to reserve memory from
     --iova-mode <pa/va>   set IOVA mode ('pa' for IOVA_PA and 'va' for IOVA_VA)
     --base-virtaddr <addr>      the base virtual address for DPDK (default: 0x200000000000)
     --num-trace-entries <num>   number of trace entries for each core, must be power of 2, setting 0 to disable trace (default 32768)
                                 Tracepoints vary in size and can use more than one trace entry.
     --rpcs-allowed        comma-separated list of permitted RPCS
     --env-context         Opaque context for use of the env implementation
     --vfio-vf-token       VF token (UUID) shared between SR-IOV PF and VFs for vfio_pci driver
 -L, --logflag <flag>    enable log flag (all, accel, accel_ioat, aio, app_config, app_rpc, bdev, bdev_concat, bdev_ftl, bdev_malloc, bdev_null, bdev_nvme, bdev_raid, bdev_raid0, bdev_raid1, blob, blob_esnap, blob_rw, blobfs, blobfs_bdev, blobfs_bdev_rpc, blobfs_rw, ftl_core, ftl_init, gpt_parse, ioat, json_util, log, log_rpc, lvol, lvol_rpc, nbd, notify_rpc, nvme, nvme_vfio, opal, reactor, rpc, rpc_client, scsi, sock, sock_posix, thread, trace, vbdev_delay, vbdev_gpt, vbdev_lvol, vbdev_opal, vbdev_passthru, vbdev_split, vbdev_zone_block, vfio_pci, vfio_user, vfu, vfu_virtio, vfu_virtio_blk, vfu_virtio_io, vfu_virtio_scsi, vfu_virtio_scsi_data, virtio, virtio_blk, virtio_dev, virtio_pci, virtio_user, virtio_vfio_user, vmd)
 -e, --tpoint-group <group-name>[:<tpoint_mask>]
                           group_name - tracepoint group name for spdk trace buffers (scsi, bdev, blobfs, thread, nvme_pcie, nvme_tcp, bdev_nvme, all)
                           tpoint_mask - tracepoint mask for enabling individual tpoints inside a tracepoint group. First tpoint inside a group can be enabled by setting tpoint_mask to 1 (e.g. bdev:0x1).
                            Groups and masks can be combined (e.g. thread,bdev:0x1).
                            All available tpoints can be found in /include/spdk_internal/trace_defs.h
 -q <depth>                io depth
 -o <size>                 io size in bytes
 -w <type>                 io pattern type, must be one of (read, write, randread, randwrite, rw, randrw, verify, reset, unmap, flush)
 -t <time>                 time in seconds
 -k <timeout>              timeout in seconds to detect starved I/O (default is 0 and disabled)
 -M <percent>              rwmixread (100 for reads, 0 for writes)
 -P <num>                  number of moving average period
                (If set to n, show weighted mean of the previous n IO/s in real time)
                (Formula: M = 2 / (n + 1), EMA[i+1] = IO/s * M + (1 - M) * EMA[i])
                (only valid with -S)
 -S <period>               show performance result in real time every <period> seconds
 -T <bdev>                 bdev to run against. Default: all available bdevs.
 -f                        continue processing I/O even after failures
 -F <zipf theta>           use zipf distribution for random I/O
 -Z                        enable using zcopy bdev API for read or write I/O
 -z                        start bdevperf, but wait for RPC to start tests
 -X                        abort timed out I/O
 -C                        enable every core to send I/Os to each bdev
 -j <filename>             use job config file
 -l                        display latency histogram, default: disable. -l display summary, -ll display details
 -D                        use a random map for picking offsets not previously read or written (for all jobs)
 -E                        share per lcore thread among jobs. Available only if -j is not used.

--wait-for-rpc is described as a separate arg in the help and it's not the same as -z (-z relates to perform_tests call). So I believe your goal is similar to:

# ./build/examples/bdevperf -t 5 -m 0x1 -q 128 -o 4096 -w randread -z &
[1] 220288
[2024-01-08 19:07:03.161424] Starting SPDK v23.09-pre git sha1 0bc4c731a / DPDK 23.03.0 initialization...
[2024-01-08 19:07:03.161481] [ DPDK EAL parameters: bdevperf --no-shconf -c 0x1 --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid220288 ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2024-01-08 19:07:03.202431] app.c: 767:spdk_app_start: *NOTICE*: Total cores available: 1
[2024-01-08 19:07:03.228079] reactor.c: 937:reactor_run: *NOTICE*: Reactor started on core 0

# ./scripts/rpc.py bdev_nvme_attach_controller --traddr 0000:17:00.0 --name kaka -t PCIE
kakan1
# PYTHONPATH=$PWD/python ./examples/bdev/bdevperf/bdevperf.py -t 10 perform_tests
Running I/O for 5 seconds...

                                                                                                Latency(us)
 Device Information          : runtime(s)       IOPS      MiB/s     Fail/s     TO/s    Average        min        max
 Job: kakan1 (Core Mask 0x1)
         kakan1              :       5.00 1177216.08    4598.50       0.00     0.00     108.68      22.25     702.62
 ===================================================================================================================
 Total                       :            1177216.08    4598.50       0.00     0.00     108.68      22.25     702.62
0
# jobs
[1]+  Running                 ./build/examples/bdevperf -t 5 -m 0x1 -q 128 -o 4096 -w randread -z &
# kill %
Received shutdown signal, test time was about 5.000000 seconds

                                                                                                Latency(us)
 Device Information          : runtime(s)       IOPS      MiB/s     Fail/s     TO/s    Average        min        max
 ===================================================================================================================
 Total                       :                  0.00       0.00       0.00     0.00       0.00       0.00       0.00
jimharris commented 9 months ago

If you use --wait-for-rpc(BTW: this flag is called -z in the --help spew which is wrong) you cannot give bdevperf a JSON config file via -c option. On the rpc.py side you cannot attach a controller until the init of the framework is done so you send _framework_startinit which then causes bdevperf to start ...but w/out any bdev devices since you could not create them before you sent _framework_startinit it exits immediately. Was this ever tested at all?

--wait-for-rpc and bdevperf's -z are two different options. The bdevperf -z option needs to be clarified to eliminate this confusion.

If you use -z option, you can use both JSON config file as well as issue additional RPCs before starting the test. Once your configuration is ready, run:

PYTHONPATH=$PYTHONPATH:python examples/bdev/bdevperf/bdevperf.py perform_tests

Let's use your issue here to track the necessary documentation improvements.

cfourteen commented 9 months ago

Thanks for the prompt response. I should have mentioned that -z was the 1st option I tried, it seemed to have no affect. I will try it again. What I want to do is be able to synchronize health data collection on the python side w/ bdevperf. I thought the wait for RPC was the ticket(maybe still is).

jimharris commented 9 months ago

Thanks for the prompt response. I should have mentioned that -z was the 1st option I tried, it seemed to have no affect. I will try it again. What I want to do is be able to synchronize health data collection on the python side w/ bdevperf. I thought the wait for RPC was the ticket(maybe still is).

Use the bdevperf.py command line mentioned in my earlier post.

jimharris commented 9 months ago

https://review.spdk.io/gerrit/c/spdk/spdk/+/21382

jimharris commented 8 months ago

Patch merged.