mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.09k stars 1.56k forks source link

[Bug] PackedFunc mlc.create_paged_kv_cache_generic cannot find #2025

Closed ChenYang3024 closed 5 months ago

ChenYang3024 commented 7 months ago

🐛 Bug

PackedFunc mlc.create_paged_kv_cache_generic cannot find

To Reproduce

Steps to reproduce the behavior:

when I run the code

from mlc_llm.model.llama.llama_model import LlamaForCasualLM, LlamaConfig
from pathlib import Path
config = LlamaConfig.from_file(Path("~LLAMA/Llama-2-7b-hf/config.json"))
model = LlamaForCasualLM(config)
mod, _named_params, _ = model.export_tvm( 
        spec=model.get_default_spec(),
        allow_extern=True,
    )
rt_mod = model.jit(spec=model.get_default_spec(), device="cpu")
raise InternalError: Check failed: (func.defined()) is false: Error: Cannot find PackedFunc mlc.create_paged_kv_cache_generic in either Relax VM kernel library, or in TVM runtime PackedFunc registry, or in global Relax functions of the VM executable
<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

Expected behavior

Environment

USE_NVTX:OFF USE_GTEST:AUTO SUMMARIZE:OFF USE_IOS_RPC:OFF USE_MSC:OFF USE_ETHOSU:OFF CUDA_VERSION:NOT-FOUND USE_LIBBACKTRACE:AUTO DLPACK_PATH:3rdparty/dlpack/include USE_TENSORRT_CODEGEN:OFF USE_THRUST:OFF USE_TARGET_ONNX:OFF USE_AOT_EXECUTOR:ON BUILD_DUMMY_LIBTVM:OFF USE_CUDNN:OFF USE_TENSORRT_RUNTIME:OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR:OFF USE_CCACHE:AUTO USE_ARM_COMPUTE_LIB:OFF USE_CPP_RTVM:OFF USE_OPENCL_GTEST:/path/to/opencl/gtest USE_MKL:OFF USE_PT_TVMDSOOP:OFF MLIR_VERSION:NOT-FOUND USE_CLML:OFF USE_STACKVM_RUNTIME:OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH:OFF ROCM_PATH:/opt/rocm USE_DNNL:OFF USE_VITIS_AI:OFF USE_MLIR:OFF USE_RCCL:OFF USE_LLVM:ON USE_VERILATOR:OFF USE_TF_TVMDSOOP:OFF USE_THREADS:ON USE_MSVC_MT:OFF BACKTRACE_ON_SEGFAULT:OFF USE_GRAPH_EXECUTOR:ON USE_NCCL:OFF USE_ROCBLAS:OFF GIT_COMMIT_HASH:f06d486b4a1a27f0bbb072688a5fc41e7b15323c USE_VULKAN:OFF USE_RUST_EXT:OFF USE_CUTLASS:OFF USE_CPP_RPC:OFF USE_HEXAGON:OFF USE_CUSTOM_LOGGING:OFF USE_UMA:OFF USE_FALLBACK_STL_MAP:OFF USE_SORT:ON USE_RTTI:ON GIT_COMMIT_TIME:2024-03-08 02:04:22 -0500 USE_HEXAGON_SDK:/path/to/sdk USE_BLAS:none USE_ETHOSN:OFF USE_LIBTORCH:OFF USE_RANDOM:ON USE_CUDA:OFF USE_COREML:OFF USE_AMX:OFF BUILD_STATIC_RUNTIME:OFF USE_CMSISNN:OFF USE_KHRONOS_SPIRV:OFF USE_CLML_GRAPH_EXECUTOR:OFF USE_TFLITE:OFF USE_HEXAGON_GTEST:/path/to/hexagon/gtest PICOJSON_PATH:3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR:OFF INSTALL_DEV:OFF USE_PROFILER:ON USE_NNPACK:OFF LLVM_VERSION:10.0.0 USE_MRVL:OFF USE_OPENCL:OFF COMPILER_RT_PATH:3rdparty/compiler-rt RANG_PATH:3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT:OFF USE_OPENMP:none USE_BNNS:OFF USE_CUBLAS:OFF USE_METAL:OFF USE_MICRO_STANDALONE_RUNTIME:OFF USE_HEXAGON_EXTERNAL_LIBS:OFF USE_ALTERNATIVE_LINKER:AUTO USE_BYODT_POSIT:OFF USE_HEXAGON_RPC:OFF USE_MICRO:OFF DMLC_PATH:3rdparty/dmlc-core/include INDEX_DEFAULT_I64:ON USE_RELAY_DEBUG:OFF USE_RPC:ON USE_TENSORFLOW_PATH:none TVM_CLML_VERSION: USE_MIOPEN:OFF USE_ROCM:OFF USE_PAPI:OFF USE_CURAND:OFF TVM_CXX_COMPILER_PATH:/usr/bin/c++ HIDE_PRIVATE_SYMBOLS:OFF

Additional context

MasterJH5574 commented 7 months ago

Thank you @ChenYang3024 for reporting and raising this great point. We missed taking the JIT support into consideration when introducing the Paged KV cache and attention support. Right now the paged attention kernels can only run on GPU, which means CPU JIT cannot work at this moment. We will think and work on this and try to support the CPU JIT again.

psunn commented 7 months ago

@MasterJH5574 Hi Ruihang, I encountered the same problem when attempting to test Llama models using the latest mlc-llm build from source. I ran into an internal error as shown below:

...
File "/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/vm.cc", line 676
InternalError: Check failed: (func.defined()) is false: Error: Cannot find PackedFunc mlc.create_paged_kv_cache_generic in either Relax VM kernel library, or in TVM runtime PackedFunc registry, or in global Relax functions of the VM executable

Could you please indicate the last revision where this issue does not occur on CPU target? Thank you.

BenchuYee commented 6 months ago

I also meet the bug, is it a way to solve?

tqchen commented 5 months ago

The latest build should solve the issue for GPU, we still do not yet have cpu support