Benchmark broken on H100

(textgen) ubuntu@anon:~/text-generation-webui/repositories/GPTQ-for-LLaMa$ stdbuf --output=L python -u llama.py ~/text-generation-webui/models/llama-7b-hf c4 \
>     --wbits 4 \
>     --groupsize 128 \
>     --load ~/text-generation-webui/models/llama-7b-4bit-128g_true-seq_act-order.safetensors \
>     --benchmark 2048 \
>     --check 2>&1 \
> | tee llama-7b-4bit-128g_true-seq_act-order_bench.log
Loading model ...
/home/ubuntu/miniconda3/envs/textgen/lib/python3.11/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
Found 3 unique KN Linear values.
Warming up autotune cache ...
  0%|          | 0/12 [00:00<?, ?it/s]python: /opt/conda/conda-bld/torchtriton_1677881353797/work/lib/Dialect/TritonGPU/Transforms/Combine.cpp:870: int {anonymous}::{anonymous}::computeCapabilityToMMAVersion(int): Assertion `false && "computeCapability > 90 not supported"' failed.

Quantization itself works, only the benchmark is broken as of 05781593c818d4dc8adc2d32c975e83d17d2b9a8.

qwopqwop200 / GPTQ-for-LLaMa

Benchmark broken on H100 #231