iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 614 forks source link

Tracy does not show GPU profiles #13682

Open harishanand95 opened 1 year ago

harishanand95 commented 1 year ago

What happened?

I do not see GPU profile option with ./iree-tracy-profiler 1.tracy. Screenshot from 2023-05-18 08-50-15

Steps to reproduce your issue

How I had setup the iree and environments Python environment:

$ pip freeze | grep iree
iree-compiler==20230404.479
iree-runtime-instrumented==20230404.479
$ pip freeze | grep torch
torch @ https://github.com/llvm/torch-mlir/releases/download/snapshot-20230517.841/torch-2.1.0.dev20230512+cpu-cp311-cp311-linux_x86_64.whl
torch-mlir @ https://github.com/llvm/torch-mlir/releases/download/snapshot-20230517.841/torch_mlir-20230517.841-cp311-cp311-linux_x86_64.whl

IREE Setup

git clone https://github.com/openxla/iree.git
cd iree
git submodule update --init

cmake -G Ninja -B ../iree-build/ -S . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DIREE_ENABLE_ASSERTIONS=ON -DIREE_ENABLE_SPLIT_DWARF=ON -DIREE_ENABLE_THIN_ARCHIVES=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DIREE_ENABLE_LLD=ON -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DIREE_BUILD_TRACY=ON -DIREE_ENABLE_LLD=ON -DIREE_ENABLE_RUNTIME_TRACING=ON
cd ../iree-build/
cmake --build . --target iree-tracy-profiler iree-tracy-capture iree-tracy-csvexport

Configuration showed this warning message at the end

-- IREE custom_dispatch/cuda/kernels ignored -- nvcc not found
CMake Warning at samples/custom_module/dynamic/CMakeLists.txt:12 (message):
  IREE_ENABLE_RUNTIME_TRACING enabled but it currently has issues with
  dynamic libraries
-- Configuring done
-- Generating done

Run tracy-capture

$ ./iree-tracy-capture -o 1.tracy
Connecting to 127.0.0.1:8086...
Queue delay: 14 ns
Timer resolution: 0 ns
   1.43 Kbps /138.5% =   0.00 Mbps | Tx: 905 bytes | 64 MB | 2.54 s
Frames: 2
Time span: 2.64 s
Zones: 5
Elapsed time: 2.4 s
Saving trace... done!
Trace size 1143 bytes (27.31% ratio)

iree-benchmark with TRACY_NO_EXIT, it waits at the end..

$ TRACY_NO_EXIT=1  iree-benchmark-module  --device=vulkan --module=model.vmfb --function=forward --input=1x3x32x32xf32
2023-05-18T08:54:17-07:00
Running /home/user/delete/debug/.venv/lib/python3.11/site-packages/iree/runtime/scripts/iree_benchmark_module/../../iree-benchmark-module
Run on (16 X 4850.19 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 1.47, 0.94, 0.63
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BM_forward/process_time/real_time      0.120 ms        0.030 ms         5540 items_per_second=8.31827k/s
^C

Here is the model.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 3, 3)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return x
model = Model()
model.eval()
module = torch_mlir.compile(model, torch.ones(1, 3, 32, 32), 
                            output_type=torch_mlir.OutputType.LINALG_ON_TENSORS, 
                            use_tracing=True, 
                            ignore_traced_shapes=False)
with open('model.mlir', "w") as f:
    f.write(module.operation.get_asm())

Commands

python model.py
iree-compile --iree-hal-target-backends=vulkan-spirv --iree-vulkan-target-triple=rdna2-unknown-linux model.mlir -o model.vmfb
TRACY_NO_EXIT=1  iree-benchmark-module  --device=vulkan --module=model.vmfb --function=forward --input=1x3x32x32xf32

What component(s) does this issue relate to?

Tracy

Version information

iree-compiler==20230404.479
iree-runtime-instrumented==20230404.479

Additional context

No response

benvanik commented 1 year ago

I suspect you are not capturing the right process - you see nothing (not just no GPU) and you shouldn't ^C your traced application. In the title of your tracy window you can see you were tracing some python process (maybe?) instead - it should show iree-benchmark-module.

harishanand95 commented 1 year ago

Thanks! I see what you mean. It works now, I used the iree-benchmark-module which had picked the python code first..

Before (did not work, had to ^C and I'm inside the python environment):

$ TRACY_NO_EXIT=1  iree-benchmark-module  --device=vulkan --module=model.vmfb --function=forward --input=1x3x32x32xf32
2023-05-18T09:06:19-07:00
Running /home/user/delete/debug/.venv/lib/python3.11/site-packages/iree/runtime/scripts/iree_benchmark_module/../../iree-benchmark-module
Run on (16 X 4850.19 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 3.42, 1.62, 1.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BM_forward/process_time/real_time      0.120 ms        0.030 ms         5251 items_per_second=8.35573k/s
^C

$ which iree-benchmark-module 
/home/user/delete/debug/.venv/bin/iree-benchmark-module

$ cat /home/user/delete/debug/.venv/bin/iree-benchmark-module
#!/home/user/delete/debug/.venv/bin/python3.11
# -*- coding: utf-8 -*-
import re
import sys
from iree.runtime.scripts.iree_benchmark_module.__main__ import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

After (works, auto stops and tracy has the GPU profiles too):

$ TRACY_NO_EXIT=1 ~/delete/debug/.venv/lib/python3.11/site-packages/iree/runtime/iree-benchmark-module --device=vulkan --module=model.vmfb --function=forward --input=1x3x32x32xf32
2023-05-18T09:09:00-07:00
Running /home/user/delete/debug/.venv/lib/python3.11/site-packages/iree/runtime/iree-benchmark-module
Run on (16 X 4850.19 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
Load Average: 1.89, 1.46, 1.02
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BM_forward/process_time/real_time      0.120 ms        0.030 ms         5362 items_per_second=8.30396k/s
benvanik commented 1 year ago

weird! I'm not familiar with that mechanism but maybe that's a known issue with the venv releases. /cc @stellaraccident / @ScottTodd (or some other python person) - if this is a known issue we'll probably want to document it but it'd be nice if it worked without it

allieculp commented 1 year ago

@stellaraccident @ScottTodd Have either of you had a chance to review this? Is this a known issue?

ScottTodd commented 1 year ago

Haven't had a chance to dig into this, but I'd believe that the Python wrapper script + subprocess.call() are interfering with process selection: https://github.com/openxla/iree/blob/76e36ff694eeff16e1cf511979eae7646fc6c503/runtime/bindings/python/iree/runtime/scripts/iree_benchmark_module/__main__.py#L15-L17

We should figure out a workaround so the iree-runtime-instrumented Python package is useful, though I'd somewhat prefer to steer people away from Python and to native binary releases or source builds for some of the more involved workflows.

allieculp commented 1 year ago

Setting this as a P2 for now - please adjust if needed.