NVIDIA RuntimeError: FP8 weight is not found in dir /work/build/models/bert/fp8/faster-transformer-bert-fp8-weights-scales

To reproduce the problem

After modifying use_fp8 from False to True as follow:

@ConfigRegistry.register(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
class H100_PCIE_80GB_CUSTOM(OfflineGPUBaseConfig):
    system = KnownSystem.H100_PCIe_80GB_Custom

    # Applicable fields for this benchmark are listed below. Not all of these are necessary, and some may be defined in the BaseConfig already and inherited.
    # Please see NVIDIA's submission config files for example values and which fields to keep.
    # Required fields (Must be set or inherited to run):
    gpu_batch_size: int = 0
    input_dtype: str = ''
...
    use_fp8: bool = True  # the default is False

run cmd make generate_engines RUN_ARGS="--benchmarks=bert --scenarios=offline" will lead to the error of RuntimeError: FP8 weight is not found in dir. The detailed error info is as follows:

[2023-07-12 18:57:23,967 main.py:231 INFO] Detected system ID: KnownSystem.H100_PCIe_80GB_Custom
[2023-07-12 18:57:26,192 generate_engines.py:172 INFO] Building engines for bert benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/../FasterTransformer/build/lib/libbert_fp8_plugin.so
[2023-07-12 18:57:26,220 bert_var_seqlen.py:67 INFO] Using workspace size: 0
[07/12/2023-18:57:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 38, GPU 928 (MiB)
[07/12/2023-18:57:32] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2981, GPU +750, now: CPU 3096, GPU 1680 (MiB)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/work/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/work/code/actionhandler/generate_engines.py", line 175, in handle
    total_engine_build_time += self.build_engine(job)
  File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
    builder.build_engines()
  File "/work/code/bert/tensorrt/bert_var_seqlen.py", line 210, in build_engines
    bert_squad_fp8_fastertransfomer(network, weights_dict, self.bert_config, self.seq_len)
  File "/work/code/bert/tensorrt/fp8_builder_fastertransformer.py", line 49, in bert_squad_fp8_fastertransfomer
    raise RuntimeError(f"FP8 weight is not found in dir {weightDirPath}, Exiting...")
RuntimeError: FP8 weight is not found in dir /work/build/models/bert/fp8/faster-transformer-bert-fp8-weights-scales/, Exiting...
[2023-07-12 18:57:36,206 generate_engines.py:172 INFO] Building engines for bert benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/../FasterTransformer/build/lib/libbert_fp8_plugin.so

SystemID setup SystemID is set as H100_PCIe_80GB_Custom using cmd python3 -m scripts.custom_systems.add_custom_system

This script creates a custom system definition within the MLPerf Inference codebase that matches the
hardware specifications of the system that it is run on. The script then does the following:

    - Backs up NVIDIA's workload configuration files
    - Creates new workload configuration files (configs/<Benchmark name>/<Scenario>/__init__.py) with dummy values
        - The user should fill out these dummy values with the correct values

============= DETECTED SYSTEM ==============

SystemConfiguration:
    System ID (Optional Alias): H100_PCIe_80GB_Custom
    CPUConfiguration:
        2x CPU (CPUArchitecture.x86_64): Intel(R) Xeon(R) Platinum 8480+
            56 Cores, 2 Threads/Core
    MemoryConfiguration: 528.08 GB (Matching Tolerance: 0.05)
    AcceleratorConfiguration:
        2x GPU (0x233110DE): NVIDIA H100 PCIe
            AcceleratorType: Discrete
            SM Compute Capability: 90
            Memory Capacity: 79.65 GiB
            Max Power Limit: 310.0 W
    NUMA Config String: &

============================================

mlcommons / inference_results_v3.0

NVIDIA RuntimeError: FP8 weight is not found in dir /work/build/models/bert/fp8/faster-transformer-bert-fp8-weights-scales #13