Llama2 70b is not working

Hi, I just download the colab you provide at this link: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb. I works properly if I use the 7b model, however if I change the settings in order to use the 70b model, I receive the following error:

InternalError Traceback (most recent call last) in <cell line: 4>() 2 from mlc_chat.callback import StreamToStdout 3 ----> 4 cm = ChatModule( 5 model="dist/Llama-2-70b-chat-hf-q4f16_1-MLC", 6 model_lib_path="dist/prebuilt_libs/Llama-2-70b-chat-hf/Llama-2-70b-chat-hf-q4f16_1-cuda.so"

5 frames tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.PackedFuncBase.call()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall3()

tvm/_ffi/_cython/./base.pxi in tvm._ffi._cy3.core.CHECK_CALL()

/workspace/mlc-llm/cpp/llm_chat.cc in LoadParams()

InternalError: Traceback (most recent call last): 7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue) const at /workspace/mlc-llm/cpp/llm_chat.cc:1633 6: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String) at /workspace/mlc-llm/cpp/llm_chat.cc:631 5: LoadParams at /workspace/mlc-llm/cpp/llm_chat.cc:219 4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>::AssignTypedLambda<void (*)(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>(void ()(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int), std::cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue)#1}> >::Call(tvm::runtime::PackedFuncObj const, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue) 3: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int) 2: tvm::runtime::relax_vm::NDArrayCacheMetadata::Load(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) 1: tvm::runtime::LoadBinaryFromFile(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*) 0: _ZN3tvm7runtime6deta File "/workspace/tvm/src/runtime/file_utils.cc", line 121 InternalError: Check failed: (!fs.fail()) is false: Cannot open dist/Llama-2-70b-chat-hf-q4f16_1-MLC/ndarray-cache.json

mlc-ai / binary-mlc-llm-libs

Llama2 70b is not working #99