iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.56k stars 569 forks source link

Have a iree_runtime_size_benchmark target and track its size #3963

Open benvanik opened 3 years ago

benvanik commented 3 years ago

We should be able to have a minimal target that pulls in the dylib backend, no flags, no logging, etc and just uses iree_vm_invoke to call a model, similar to the existing iree/vm/bytecode_module_size_benchmark. The intent is to see what the impact of a standalone IREE build is to a user's application size. Even better would be another standard_library_size_benchmark that was just main() and pulled in a few things that we can reasonably expect any integrator to already have (like printf, std::string/std::vector as used by our current C++ code, etc). Diffing those two will give us the delta IREE adds to an average application.

@hanhanW I feel like you were playing with some benchmark tracking stuff at some point - could we potentially feed the byte size values to the same thing as time values and get size-over-time tracking?

We should be compiling with Release or MinSizeRel in cmake as that is what anyone integrating us will be using.

benvanik commented 3 years ago

As a possible extension (or part of the core work) it'd be fantastic to wire up https://github.com/google/bloaty. Dumping a bloaty report would tell us what files are taking up the bytes, and per-run we could diff against base to see what the increase of was across compilation units.

$ ./bloaty bloaty -d compileunits
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  34.8%  10.2Mi  43.4%  2.91Mi    [163 Others]
  17.2%  5.08Mi   4.3%   295Ki    third_party/protobuf/src/google/protobuf/descriptor.cc
   7.3%  2.14Mi   2.6%   179Ki    third_party/protobuf/src/google/protobuf/descriptor.pb.cc
   4.6%  1.36Mi   1.1%  78.4Ki    third_party/protobuf/src/google/protobuf/text_format.cc
   3.7%  1.10Mi   4.5%   311Ki    third_party/capstone/arch/ARM/ARMDisassembler.c
   1.3%   399Ki  15.9%  1.07Mi    third_party/capstone/arch/M68K/M68KDisassembler.c
   3.2%   980Ki   1.1%  75.3Ki    third_party/protobuf/src/google/protobuf/generated_message_reflection.cc
   3.2%   965Ki   0.6%  40.7Ki    third_party/protobuf/src/google/protobuf/descriptor_database.cc
   2.8%   854Ki  12.0%   819Ki    third_party/capstone/arch/X86/X86Mapping.c
   2.8%   846Ki   1.0%  66.4Ki    third_party/protobuf/src/google/protobuf/extension_set.cc
   2.7%   800Ki   0.6%  41.2Ki    third_party/protobuf/src/google/protobuf/generated_message_util.cc
   2.3%   709Ki   0.7%  50.7Ki    third_party/protobuf/src/google/protobuf/wire_format.cc
   2.1%   637Ki   1.7%   117Ki    third_party/demumble/third_party/libcxxabi/cxa_demangle.cpp
   1.8%   549Ki   1.7%   114Ki    src/bloaty.cc
   1.7%   503Ki   0.7%  48.1Ki    third_party/protobuf/src/google/protobuf/repeated_field.cc
   1.6%   469Ki   6.2%   427Ki    third_party/capstone/arch/X86/X86DisassemblerDecoder.c
   1.4%   434Ki   0.2%  15.9Ki    third_party/protobuf/src/google/protobuf/message.cc
   1.4%   422Ki   0.3%  23.4Ki    third_party/re2/re2/dfa.cc
   1.3%   407Ki   0.4%  24.9Ki    third_party/re2/re2/regexp.cc
   1.3%   407Ki   0.4%  29.9Ki    third_party/protobuf/src/google/protobuf/map_field.cc
   1.3%   397Ki   0.4%  24.8Ki    third_party/re2/re2/re2.cc
 100.0%  29.5Mi 100.0%  6.69Mi    TOTAL
hanhanW commented 3 years ago

Yes, we can do it. The benchmarking infra just needs a number, it can be either time, or size, or whatever you want. It's really flexible. It requires a black box to generate a number (like a script). Then we can bundle the number and the git revision and make something like https://mako.dev/benchmark?benchmark_key=5538704950034432

(imaging that the time unit to be size unit instead)

hanhanW commented 3 years ago

We can probably wire up bloaty report to a run (like: https://mako.dev/run?run_key=6234066354438144&~c=1&~v=1&~g=1) and aggregate sizes to a number and show numbers in the benchmark (like https://mako.dev/benchmark?benchmark_key=5538704950034432).

(The current usage is only one data point, we can probably have a profiling plot in a run, like https://mako.dev/run?run_key=5681260090359808&~pl=1&~pe=1&~st=1&~dl=1&~de=1&~dt=1&~pet=1&~det=1).

hanhanW commented 3 years ago

I looked into Mako and found that it's hard to embed bloaty report to a Mako run. However, we can still save the binary in BuildKite as artifacts. Once we hit a regression, we can pull the binary and do a diff (with command /bloaty bloaty -- oldbloaty).

I'm thinking to config IREE with -DIREE_HAL_DRIVERS_TO_BUILD="VMLA", -DIREE_HAL_DRIVERS_TO_BUILD="VMLA;DyLIB", and -DIREE_HAL_DRIVERS_TO_BUILD="VMLA;Vulkan" in Release build, then tracking the binary size of iree-run-module. What do you think? (can consider to add MinSizeRel mode as well.)

I did it locally and got 1.8M for VMLA;DyLib config, 2.3M for VMLA;Vulkan config, and 1.8M for VMLA config.

pzread commented 2 years ago

@benvanik do you think this issue is still relevant? It looks like we also want to track the IREE runtime size in addition to the module size we are working on now.

If so, I'm happy to think about how to integrate this with the new benchmark infra.

GMNGeoffrey commented 2 years ago

There are like three open issues about this same thing. I'm not sure which to make canonical, but #9167, #7972 and #6161 all appear to be roughly the same

pzread commented 2 years ago

There are like three open issues about this same thing. I'm not sure which to make canonical, but #9167, #7972 and #6161 all appear to be roughly the same

I think this one is different. It wants to track the size of IREE runtime instead of the VMFB modules.

GMNGeoffrey commented 2 years ago

6161 includes that. I'm not sure whether duping everything with "miscellaneous system states" is helpful or not though