Open mariecwhite opened 1 year ago
@monorimet @dan-garvey
WIP
Hey there, a few questions as I'm implementing this.
pytest --benchmark
generate a separate log file to maintain readability in the benchmark results, where I can write full reproduction steps (including compile-time flags, etc.)... We don't currently have much trace/log information exposed in our API but if there is something specific to be included please let me know so I can have them fetched or generated.What kind of threading is the IREE team interested in having reported in benchmark results? PyTorch and TF have API for fetching the number of inter- and intra-op parallelism threads used, as well as separate API for other parallelized processes such as input preprocessing.
Key here is for us to make sure the number of runtime threads used is the same for IREE and baseline (PyTorch and TF). I know at the moment the default configs are being used so any info about number of threads on all runtimes would be helpful, especially for debugging. At the moment IREE has the --task_topology_group_count
runtime param and --iree-codegen-llvm-number-of-threads
compiler param so if we find that the baseline config is using a different config, we can adjust accordingly and have a more apples-to-apples comparison.
I am planning on having pytest --benchmark generate a separate log file to maintain readability in the benchmark results, where I can write full reproduction steps (including compile-time flags, etc.)... We don't currently have much trace/log information exposed in our API but if there is something specific to be included please let me know so I can have them fetched or generated.
Full repro steps is the goal, including mlir files, input files, model files, etc. for debugging. Would it be possible to also save the tuning configs (or an id to the tuning config used so that we can look it up after the fact)?
I suggest for TF we use our n2-highcpu-64
icelake instances. It has two numa nodes of 16 cores (no HT) and the Intel TF version pins TF to one numa node. We can set IREE to also use the same number of 16 threads on one numa node with numactl
Close
For each benchmark run, please include in the results: