fix: use block_in_place in llama tokenizer

danbev commented 9 months ago

This commit changes the LLamaTokenizer to use tokio::task::block_in_place in an attempt avoid the following error:

$ env RUST_BACKTRACE=1 \
 LLM_CHAIN_MODEL=models/llama-2-7b-chat.ggmlv3.q4_0.bin \
 cargo r --example few_shot

thread 'main' panicked at
'Cannot block the current thread from within a runtime. This happens
because a function attempted to block the current thread while the
thread is being used to drive asynchronous tasks.',
crates/llm-chain-llama/src/executor.rs:290:36
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_display
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:150:5
   3: core::panicking::panic_str
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:134:5
   4: core::option::expect_failed
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:1932:5
   5: core::option::Option<T>::expect
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:898:21
   6: tokio::future::block_on::block_on
             at /home/danielbevenius/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/future/block_on.rs:6:21
   7: tokio::sync::mutex::Mutex<T>::blocking_lock
             at /home/danielbevenius/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/sync/mutex.rs:510:9
   8: <llm_chain_llama::executor::LLamaTokenizer as llm_chain::tokens::Tokenizer>::tokenize_str
             at ./llm-chain-llama/src/executor.rs:290:23
   9: <llm_chain_llama::executor::Executor as llm_chain::traits::Executor>::tokens_used
             at ./llm-chain-llama/src/executor.rs:233:31

Fixes: https://github.com/sobelio/llm-chain/issues/211

Juzov commented 9 months ago

LGTM, does this require #[tokio::main(flavor = "multi_thread", worker_threads = 1)]?

danbev commented 9 months ago

LGTM, does this require #[tokio::main(flavor = "multi_thread", worker_threads = 1)]?

Yes, it requires that the flavor is multi_thread which I believe is the default, so it would also be possible to just specify #[tokio::main]. I just though having it explicit might be helpful in the example, but I'd be happy to change this.

sobelio / llm-chain

fix: use block_in_place in llama tokenizer #235