Closed tiandi111 closed 11 months ago
It's may due to float precision. For fp16, settings of atol = rtol = 1e-3
or 1e-2
is enough
It seems that the precision error is huge at some elements. I compared 00055_fused_NT_matmul5_cast2_arg2.npy
files and get the following output:
Traceback (most recent call last):
File "/home/mlc-llm/benchmark/acc_check.py", line 48, in <module>
check_single(
File "/home/mlc-llm/benchmark/acc_check.py", line 43, in check_single
np.testing.assert_allclose(arr1, arr2, rtol=1e-2, atol=1e-2, verbose=True)
File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.01, atol=0.01
Mismatched elements: 129 / 32000 (0.403%)
Max absolute difference: 3.0947266
Max relative difference: 1527.6666
x: array([[[ 0.031006, 0.15625 , 0.862305, ..., -1.228516, -0.04248 ,
-1.756836]]], dtype=float32)
y: array([[[ 0.031006, 0.15625 , 0.862305, ..., -1.228516, -0.04248 ,
-1.756836]]], dtype=float32)
Also, the mismatched arrays is always the same, hence, I suspect that there's bug in generated hip code.
0.403% mismatch usually doesn't bother me a lot, but the abs/relative diff looks a bit scary.
Could you try atol = rtol = 1e-3
? the abs/relative diff may cone from different elements.
Our ROCm backend has been rapidly evolving and stabilized since mid-Sept, and now it supports both single- and multi-GPU inference. With on-device compilation, it’s now supporting broader AMD devices with different GFX without having to rely on prebuilt models. More info: https://github.com/mlc-ai/llm-perf-bench#mlc-llm.
Let me know if it works on your end
Our ROCm backend has been rapidly evolving and stabilized since mid-Sept, and now it supports both single- and multi-GPU inference. With on-device compilation, it’s now supporting broader AMD devices with different GFX without having to rely on prebuilt models. More info: https://github.com/mlc-ai/llm-perf-bench#mlc-llm.
Let me know if it works on your end
Hi junru, thanks for your reply. Sorry for not updating status from our side. This bug is actually solved a few weeks ago after we syncing our local tvm repo with the github one.
Also, we've successfully ran llama13B, 30B and 65B with the support of tvm and mlc repo. Multi-gpu inference support is awesome. I cannot wait to update more information as well as make more contribution to the community once the confidentiality period passed.
Highly appreciate efforts made by TVM and MLC teams.
🐛 Bug
I'm compiling LLama-13b-hf model with MLC-LLM ROCm backend and found that the compiled model outputs different value each time. We found that the error was introduced mainly from matmul ops.
To Reproduce
Steps to reproduce the behavior:
USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: OFF CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: OFF USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --ignore-libllvm --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: ON GIT_COMMIT_HASH: 392222e0216032509a635996b047f5c232b54402 USE_VULKAN: OFF USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2023-09-11 14:30:10 +0800 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 16.0.6 USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: none USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: ON USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /opt/rocm/bin/hipcc HIDE_PRIVATE_SYMBOLS: ON