Module view did not show up

dpan817 commented 11 months ago

I tried to profile the bevfusion model with below code (with_modules=True). After run the inference and started the tensorboard, the Module view did not show in the View drop list. The Overview, Operator, GPU kernel, Trace, and Memory view worked properly.

Can anyone help figure out what is wrong?

import mmcv
import torch
import torch.profiler
from datetime import datetime

def single_gpu_test(model, data_loader):
    model.eval()
    results = []
    dataset = data_loader.dataset
    prog_bar = mmcv.ProgressBar(len(dataset))

    ## dpan: add profiler to collect performance data
    now = datetime.now()
    date_time_string = now.strftime("%Y%m%d%H%M%S")
    profiler_log = "/home/adlink/tensorboard/Profiler-" + date_time_string
    prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=5, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler(profiler_log),
        record_shapes=True,
        profile_memory=True,
        with_stack=True,
        with_flops=True,
        with_modules=True)
    prof.start() ## dpan: profiler start

    for data in data_loader:
        with torch.no_grad():
            result = model(return_loss=False, rescale=True, **data)
        results.extend(result)

        prof.step() ## dpan: Need to call this at the end of each step to notify profiler of steps' boundary

        batch_size = len(result)
        for _ in range(batch_size):
            prog_bar.update()
    return results

Python, Torch, and the plugin version:

(bevfusion_mit) adlink@Adlink-RTX3090:~/Downloads/Lidar_AI_Solution/CUDA-BEVFusion/bevfusion$ python --version
Python 3.8.16
(bevfusion_mit) adlink@Adlink-RTX3090:~/Downloads/Lidar_AI_Solution/CUDA-BEVFusion/bevfusion$ pip list | grep torch
pytorch-quantization     2.1.2
torch                    1.10.1
torch-tb-profiler        0.4.1
torchaudio               0.10.1
torchinfo                1.8.0
torchpack                0.3.1
torchvision              0.11.2
(bevfusion_mit) adlink@Adlink-RTX3090:~/Downloads/Lidar_AI_Solution/CUDA-BEVFusion/bevfusion$

aaronenyeshi commented 10 months ago

Hi, I believe the module view is built on the older stack trace information. To enable it, you may need to collect your profiles using

from torch._C._profiler import _ExperimentalConfig

prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=5, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler(profiler_log),
        record_shapes=True,
        profile_memory=True,
        with_stack=True,
        with_flops=True,
        with_modules=True,
        experimental_config=_ExperimentalConfig(verbose=True))

dpan817 commented 10 months ago

Thanks, Aaron, but _ExperimentalConfig is not available in the torch until version 1.13 will try it later with torch V1.13.1

EricLiuhhh commented 5 months ago

Thanks, Aaron, but _ExperimentalConfig is not available in the torch until version 1.13 will try it later with torch V1.13.1

Hi! I also encountered the same problem and the pytorch version is the same as yours. Have you solved this problem?

pytorch / kineto

Module view did not show up #798