simonw / llm-mlc

LLM plugin for running models using MLC
Apache License 2.0
174 stars 8 forks source link

Troubleshooting VK_ERROR_OUT_OF_DEVICE_MEMORY #14

Open g-yziquel opened 9 months ago

g-yziquel commented 9 months ago

Hi.

I believe your README.md would benefit from having a troubleshooting section. I got:

mini-me@virtucon ~> llm -m llama2 'hello, world.'
Error: Traceback (most recent call last):
  9: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /workspace/mlc-llm/cpp/llm_chat.cc:1213
  8: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String)
        at /workspace/mlc-llm/cpp/llm_chat.cc:504
  7: LoadParams
        at /workspace/mlc-llm/cpp/llm_chat.cc:237
  6: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)>::AssignTypedLambda<void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)>(void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
  4: tvm::runtime::relax_vm::NDArrayCacheMetadata::FileRecord::ParamRecord::Load(DLDevice, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::function<void (tvm::runtime::NDArray, void const*, long)>) const
  3: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>)
  2: tvm::runtime::vulkan::VulkanDeviceAPI::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
  1: tvm::runtime::vulkan::VulkanBuffer::VulkanBuffer(tvm::runtime::vulkan::VulkanDevice const&, unsigned long, unsigned int, unsigned int)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/vulkan/vulkan_buffer.cc", line 61
InternalError: Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-2: VK_ERROR_OUT_OF_DEVICE_MEMORY
mini-me@virtucon ~ [1]>

This kind of error is a bit intimidating, to be honest.

g-yziquel commented 9 months ago

This error may be reproduced with the following python statement:

cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1")

See, for instance, the mlc.ai documentation, here:

https://llm.mlc.ai/docs/index.html#getting-started

That situation has, in fact, been reported upstream:

https://github.com/mlc-ai/mlc-llm/issues/974