iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.48k stars 553 forks source link

Log config and runtime details of each benchmark in Shark Tank #10201

Open mariecwhite opened 1 year ago

mariecwhite commented 1 year ago

For each benchmark run, please include in the results:

powderluv commented 1 year ago

@monorimet @dan-garvey

erob710 commented 1 year ago

WIP

monorimet commented 1 year ago

Hey there, a few questions as I'm implementing this.

  1. What kind of threading is the IREE team interested in having reported in benchmark results? PyTorch and TF have API for fetching the number of inter- and intra-op parallelism threads used, as well as separate API for other parallelized processes such as input preprocessing.
  2. I am planning on having pytest --benchmark generate a separate log file to maintain readability in the benchmark results, where I can write full reproduction steps (including compile-time flags, etc.)... We don't currently have much trace/log information exposed in our API but if there is something specific to be included please let me know so I can have them fetched or generated.
mariecwhite commented 1 year ago

What kind of threading is the IREE team interested in having reported in benchmark results? PyTorch and TF have API for fetching the number of inter- and intra-op parallelism threads used, as well as separate API for other parallelized processes such as input preprocessing.

Key here is for us to make sure the number of runtime threads used is the same for IREE and baseline (PyTorch and TF). I know at the moment the default configs are being used so any info about number of threads on all runtimes would be helpful, especially for debugging. At the moment IREE has the --task_topology_group_count runtime param and --iree-codegen-llvm-number-of-threads compiler param so if we find that the baseline config is using a different config, we can adjust accordingly and have a more apples-to-apples comparison.

I am planning on having pytest --benchmark generate a separate log file to maintain readability in the benchmark results, where I can write full reproduction steps (including compile-time flags, etc.)... We don't currently have much trace/log information exposed in our API but if there is something specific to be included please let me know so I can have them fetched or generated.

Full repro steps is the goal, including mlir files, input files, model files, etc. for debugging. Would it be possible to also save the tuning configs (or an id to the tuning config used so that we can look it up after the fact)?

powderluv commented 1 year ago

I suggest for TF we use our n2-highcpu-64 icelake instances. It has two numa nodes of 16 cores (no HT) and the Intel TF version pins TF to one numa node. We can set IREE to also use the same number of 16 threads on one numa node with numactl

erob710 commented 1 year ago

Close