TVMError: The output probabilities are all NaNs, can not sample from it

https://replicate.com/p/yzcvd2ublijzruz4nxkznwpbqe

Output

Prediction failed.

Traceback (most recent call last): 7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const at /workspace/mlc-llm/cpp/llm_chat.cc:1492 6: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String) at /workspace/mlc-llm/cpp/llm_chat.cc:858 5: mlc::llm::LLMChat::SampleTokenFromLogits(tvm::runtime::NDArray, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, picojson::value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, picojson::value> > >) at /workspace/mlc-llm/cpp/llm_chat.cc:1098 4: mlc::llm::LLMChat::SampleFromProbOnCPU(float) at /workspace/mlc-llm/cpp/llm_chat.cc:1342 3: _ZN3tvm7runtime13PackedFun 2: tvm::runtime::TypedPackedFunc<int (tvm::runtime::NDArray, double, double)>::AssignTypedLambda<int (*)(tvm::runtime::NDArray, double, double)>(int (*)(tvm::runtime::NDArray, double, double), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const 1: tvm::runtime::relax_vm::SampleTopPFromProb(tvm::runtime::NDArray, double, double) 0: _ZN3tvm7runtime6deta File "/workspace/tvm/src/runtime/relax_vm/lm_support.cc", line 471 TVMError: The output probabilities are all NaNs, can not sample from it

Logs

MLC is currently not using any LoRAs.
Your formatted prompt is:
[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Shorten the following text to 300 words or less, while preserving the
overall meaning and the most important points:
In 2006, I sold, for millions of dollars, an internet company that I had co-founded a few years earlier. It was a strange company for many reasons, not the least of which was that we had no employees from beginning to end. I wrote every line of code and did all the accounting and customer support. The terms of the deal were such that my co-founder and I didn't have to work for the acquiring company at all. We were free to move on to other things, and we did. A few months later, my wife and I moved from our 865-square-foot apartment near Boston to a country house 25 miles outside of Philadelphia. I had just turned 27. She went to her job, and I sat at home doing nothing for the first time in my life. We knew no one for 100 miles in any direction. Naturally, I started tinkering on the computer again, starting about a dozen side projects simultaneously. A year and a half later, I thought I was onto something. I noticed two things that bothered me about Google. Too much spam, all those sites with nothing but ads, and not enough instant answers. I kept going to Wikipedia and IMDB. I thought if I could easily pick out the spam and the answers, then I'd have a more compelling search engine. Both problems were harder to solve than I initially thought, but I thoroughly enjoyed the work and kept at it. Everyone I talked to about my search engine project thought I was nuts. You're doing what? Competing against Google? Why? How? Another year later, in the fall of 2008, I flipped the switch, unveiling my search engine to the public. DuckDuckGo had a rather uneventful launch, if you can even call it a launch. I posted it to a niche tech site called Hacker News, and that was the long and short of it. The post was entitled, What Do You Think of My New Search Engine? Like many entrepreneurs, I'm motivated by being on the cusp of something big, and I was at the point where I needed some validation. I can survive on little, but I needed something. I got it. Granted, the product wasn't anything you'd want to switch to at that point, and people let me know that. It was an internet forum, after all. However, I still felt there was genuine interest in a new search engine competitor. I could tell some people were growing wary of what Google was becoming. For example, those initial conversations led me to investigate search privacy and eventually become the search engine that doesn't track you, years before government and corporate surveillance became a mainstream issue. In any case, the response I received was enough motivation to keep me going, which brings me to traction. I needed some. Traction is the best way to improve your chances of startup success. Traction is a sign that something is working. If you charge for your product, it means customers are buying. If your product is free, it's a growing user base. Traction is powerful. Technical, market, and team risks are easier to address with traction. Fundraising, hiring, press, partnerships, and acquisitions all become much easier. In other words, traction trumps everything. My last startup had grown using two traction channels. First, search engine optimization, ranking high in search engines for relevant terms. And later, viral marketing, where your customers bring in other customers, such as by referring friends and family through use of the product. Viral marketing doesn't work well in search because you can't easily bake it into the product by putting stuff between people and their search results. So I tried search engine optimization. The terms search engine and search engines were too hard to rank for as the high ranking companies had been around for a decade and had tens of thousands of links pointing at them from their long histories. New search engine was much more in my grasp. I worked hard for many months to rank high for this phrase. The key to good search engine optimization, SEO, is getting links. As you will hear later in the SEO chapter, you need a strategy to get these links in a scalable way. Getting stories written about you in blogs and news outlets is a common SEO linking strategy. However, I hit saturation with that channel strategy pretty quickly, and it didn't get me to the top. Something more creative was required. After much brainstorming and experimenting, I eventually hit upon a good idea. I built a Karma widget that would display links to your social media profiles and how many followers you had on each service. People would embed it on their sites and at the bottom there would be a link back to DuckDuckGo that said, new search engine. This channel strategy worked beautifully. I was number one. Trouble was, not a ton of people make that search, about 50 a day. So while I did get some traction and a steady stream of new users, it leveled off pretty quickly. It wasn't enough traction to be meaningful. It didn't move the needle. I made two large traction mistakes here. First, I failed to have a concrete traction goal. In retrospect, to move the needle for my traction goals at the time, I needed more like 5,000 new visitors a day, not 50. Search engine optimization was not going to get me there. Second, I was biased by my previous experience. Just because my last company got traction in this way didn't mean it was right for every company. These are very natural mistakes to make. In fact, most startups make them. The most common startup trajectory now goes something like the following. Founders have an idea for a company they're excited about. Initial excitement turns into a struggle to build a product, but they do get something out the door. Launch! The founders expected customers to beat a path to their door, but unfortunately, that isn't happening. Getting traction was an afterthought, but now they are focused on it. They try what they know or what they've heard others do. Some Facebook ads, a little local PR, and maybe a smattering of blog posts. Then, they run out of money and the company dies. Sadly, this is the norm. Even sadder, often these products are actually on to something. That is, with the right traction strategy, they might have actually been able to get traction and not go out of business. Given my previous startup success, I thought I knew what I was doing. I was wrong. Luckily, I wasn't dead wrong. I had the money to self-fund through my traction mistakes, and so they didn't prove fatal for DuckDuckGo. Not everyone is as lucky. Right when I realized I was making these mistakes, I also realized I didn't know the right way to go about getting traction at all. I asked around. It turns out there was no good framework for good interaction. And that's how this book was born, way back in 2009. Around this time, I also started angel investing, and more seriously advising other startups. I saw firsthand similar struggles and mistakes. I also partnered with Justin Mayers, my co-author. Justin founded two startups, one of which was acquired, and recently ran growth at Exceptional Cloud Services, which was acquired by Rackspace in 2013 for millions. He's a growth expert in his own right. We set out to help startups get traction no matter what business they were in, from internet companies to local small businesses and everything in between. We drew on our own personal experiences, interviewed more than 40 founders, studied many more companies, and pulled out the repeatable framework they use to succeed. That framework is Bullseye, a simple three-step process for getting traction. Bullseye works for startups of all kinds, consumer or enterprise focused, large or small. Since DuckDuckGo's humble beginnings, we have grown five orders of magnitude, 10 times growth spurts, from that initial 100 searches a day to now over 10 million a day. Each step from 100 to 1,000 10,000 to 100,000, 1,000,000 to 10,000,000 involved figuring out how to get traction again. That's because, as you will hear, often what works in one growth stage eventually stops working. Thankfully, we had Bullseye to help us find the right traction channel strategy at the right time. After my search engine optimization mistake, we shifted to using content marketing, social and display ads, publicity, and most recently, business development. We've hit the bullseye repeatedly, and so can you.
[/INST]
Not using LoRA
Traceback (most recent call last):
File "/nix/store/8hdk34qmzqrqc10i5fzamlm7bksa888s-python3-3.11.4-env/lib/python3.11/site-packages/cog/server/worker.py", line 226, in _predict
for r in result:
File "/src/predict.py", line 203, in predict
for decoded_token in self.engine(
File "/src/src/inference_engines/mlc_engine.py", line 159, in __call__
self.cm._prefill(input=prompt, generation_config=generation_config)
File "/nix/store/8hdk34qmzqrqc10i5fzamlm7bksa888s-python3-3.11.4-env/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 992, in _prefill
self._prefill_func(
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/nix/store/8hdk34qmzqrqc10i5fzamlm7bksa888s-python3-3.11.4-env/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 1492, in mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 858, in mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 1098, in mlc::llm::LLMChat::SampleTokenFromLogits(tvm::runtime::NDArray, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, picojson::value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, picojson::value> > >)
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 1342, in mlc::llm::LLMChat::SampleFromProbOnCPU(float)
tvm._ffi.base.TVMError: Traceback (most recent call last):
7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1492
6: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:858
5: mlc::llm::LLMChat::SampleTokenFromLogits(tvm::runtime::NDArray, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, picojson::value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, picojson::value> > >)
at /workspace/mlc-llm/cpp/llm_chat.cc:1098
4: mlc::llm::LLMChat::SampleFromProbOnCPU(float)
at /workspace/mlc-llm/cpp/llm_chat.cc:1342
3: _ZN3tvm7runtime13PackedFun
2: tvm::runtime::TypedPackedFunc<int (tvm::runtime::NDArray, double, double)>::AssignTypedLambda<int (*)(tvm::runtime::NDArray, double, double)>(int (*)(tvm::runtime::NDArray, double, double), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
1: tvm::runtime::relax_vm::SampleTopPFromProb(tvm::runtime::NDArray, double, double)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/relax_vm/lm_support.cc", line 471
TVMError: The output probabilities are all NaNs, can not sample from it

replicate / cog-llama-template

TVMError: The output probabilities are all NaNs, can not sample from it #86

Output

Logs