mlcommons / inference_results_v2.0

This repository contains the results and code for the MLPerf™ Inference v2.0 benchmark.
https://mlcommons.org/en/inference-datacenter-20/
Apache License 2.0
9 stars 12 forks source link

Finding kernel source #18

Open mahmoodn opened 1 year ago

mahmoodn commented 1 year ago

Hi Using nsight compute, I see a kernel named res2_sm_80_kernel for resnet50.

Finished setting up SUT.
Starting warmup. Running for a minimum of 0 seconds.
==PROF== Profiling "convActPoolKernelV2" - 0 (1/200000): 0%....50%....100% - 1 pass
==PROF== Profiling "res2_sm_80_kernel" - 1 (2/200000): 0%....50%....100% - 1 pass

The command is

./build/bin/harness_default \
--logfile_outdir="./build/logs/2023.02.20-15.12.02/m2022_TRT/resnet50/Offline" \
--logfile_prefix="mlperf_log_" \
--performance_sample_count=2048 \
--gpu_copy_streams=2 \
--gpu_inference_streams=1 \
--run_infer_on_copy_streams=false \
--warmup_duration=0.0 \
--gpu_batch_size=1024 \
--map_path="./data_maps/imagenet/val_map.txt" \
--tensor_path="./build/preprocessed_data/imagenet/ResNet50/int8_linear" \
--use_graphs=false \
--gpu_engines="./build/engines/m2022/resnet50/Offline/resnet50-Offline-gpu-b1024-int8.lwis_k_99_MaxP.plan" \
--mlperf_conf_path="./measurements/m2022_TRT/resnet50/Offline/mlperf.conf" \
--user_conf_path="./measurements/m2022_TRT/resnet50/Offline/user.conf" \
--max_dlas=0 \
--scenario Offline \
--model resnet50

I searched the code to see the kernel code, but it seems that this name is part of another Nvidia library. Also with the name sm_80 I expect that this name has been formed by an engine, so this complete name doesn't exist in the libraries.

I also wasn't able to find the higher level C++ function that invokes this kernel. Any idea about how to find details of this kernel?

nv-ananjappa commented 1 year ago

That kernel might be part of the TensorRT library, which ships in binary form.

mahmoodn commented 1 year ago

Thanks for the reply. I have recently encountered a problem with resnet50 and Nvbit which I have raised the issue here. I think you also have worked with these two in your tests with MLPerf benchmarks. Have you seen a similar thing before? I have stuck at that problem and tools are closed source.