Failed to detect local GPU

chensinit commented 1 year ago

🐛 Bug

Hello. I try to build model but my gpu is not work and i get error.

$ python build.py --hf-path=databricks/dolly-v2-3b --quantization q4f16_0 --target android --max-seq-len 768 Weights exist at dist/models/dolly-v2-3b, skipping download. Using path "dist/models/dolly-v2-3b" for model "dolly-v2-3b" Database paths: ['log_db/vicuna-v1-7b', 'log_db/rwkv-raven-3b', 'log_db/rwkv-raven-1b5', 'log_db/redpajama-3b-q4f16', 'log_db/dolly-v2-3b', 'log_db/rwkv-raven-7b', 'log_db/redpajama-3b-q4f32'] Target configured: opencl -keys=opencl,gpu -max_num_threads=256 -max_shared_memory_per_block=16384 -max_threads_per_block=256 -texture_spatial_limit=16384 -thread_warp_size=1 Failed to detect local GPU, falling back to CPU as a target Automatically using target for weight quantization: llvm -keys=cpu Start computing and quantizing weights... This may take a while. Finish computing and quantizing weights. Total param size: 1.4633262157440186 GB Start storing to cache dist/dolly-v2-3b-q4f16_0/params [0710/0710] saving param_709 All finished, 51 total shards committed, record saved to dist/dolly-v2-3b-q4f16_0/params/ndarray-cache.json Save a cached module to dist/dolly-v2-3b-q4f16_0/mod_cache_before_build_android.pkl. Dump static shape TIR to dist/dolly-v2-3b-q4f16_0/debug/mod_tir_static.py Dump dynamic shape TIR to dist/dolly-v2-3b-q4f16_0/debug/mod_tir_dynamic.py

Dispatch to pre-scheduled op: fused_NT_matmul2_divide1_maximum1_minimum1_cast7
Dispatch to pre-scheduled op: fused_softmax1_cast8
Dispatch to pre-scheduled op: layer_norm1
Dispatch to pre-scheduled op: matmul8
Dispatch to pre-scheduled op: fused_NT_matmul_divide_maximum_minimum_cast2
Dispatch to pre-scheduled op: fused_NT_matmul1_add3_add5_add5
Dispatch to pre-scheduled op: matmul2
Dispatch to pre-scheduled op: fused_softmax_cast3
Dispatch to pre-scheduled op: fused_layer_norm1_cast6
Dispatch to pre-scheduled op: fused_NT_matmul1_add3_add5_add5_cast5
Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to Finish exporting to dist/dolly-v2-3b-q4f16_0/dolly-v2-3b-q4f16_0-android.tar Finish exporting chat config to dist/dolly-v2-3b-q4f16_0/params/mlc-chat-config.json free(): invalid pointer 중지됨 (코어 덤프됨) <--- Stoped (core is dumped)

To Reproduce

Steps to reproduce the behavior:

python build.py --hf-path=databricks/dolly-v2-3b --quantization q4f16_0 --target android --max-seq-len 768

Expected behavior

build model is success.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): cuda
Operating system (e.g. Ubuntu/Windows/MacOS/...): ubuntu
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) GTX 1080 TI
How you installed MLC-LLM (conda, source): source
How you installed TVM-Unity (pip, source): source
Python version (e.g. 3.10): 3.10.9
GPU driver version (if applicable): latest ( Iinstalled 1 month ago.)
CUDA/cuDNN version (if applicable): release 12.1, V12.1.105
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: OFF USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_LLVM: ON USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_ROCBLAS: OFF GIT_COMMIT_HASH: 3c6e82fb3bb6510c676aad807c79a8e519f57f5a USE_VULKAN: OFF USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2023-05-29 21:35:11 -0400 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 14.0.0 USE_OPENCL: ON COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: none USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /usr/bin/c++ HIDE_PRIVATE_SYMBOLS: OFF
Any other relevant information:

Additional context

junrushao commented 1 year ago

free(): invalid pointer
중지됨 (코어 덤프됨) <--- Stoped (core is dumped)

This error is caused by symbol conflicts between TVM and PyTorch at program exit time, which you may safely ignore. The build itself should work according to the logs you shared

chensinit commented 1 year ago

Thank you!

mlc-ai / mlc-llm