Open Zeyuan-Liu opened 1 month ago
Hi,
Our prompt is designed for base models (without instruction tuning or RLHF). Using a chat model may lead to unexpected output and the program may not parse it correctly. To debug, you may print the raw outputs of the LLM.
Following the readme.md, I tried to run RAP for gsm8k using exllama, with the recommended instruction:
CUDA_VISIBLE_DEVICES=0,1 python examples/RAP/gsm8k/inference.py --base_lm exllama --exllama_model_dir my/path/to/Llama-2-7B-Chat-GPTQ --exllama_lora_dir None --exllama_mem_map '[16,22]' --n_action 1 --n_confidence 1 --n_iters 1 --temperature 0.0
but encountered following RuntimeError:
`Using the latest cached version of the dataset since gsm8k couldn't be found on the Hugging Face Hub Found the latest cached dataset configuration 'main' at /home/lzy/.cache/huggingface/datasets/gsm8k/main/0.0.0/1505e1f9da07dd20 (last modified on Sun Aug 4 12:00:13 2024). gsm8k: 0%| | 0/1319 [00:00<?, ?it/s] Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py:119: UserWarning: max_new_tokens is not set, we will use the default value: 200 warnings.warn(f"max_new_tokens is not set, we will use the default value: {self.max_new_tokens}") /home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py:122: UserWarning: do_sample is False while the temperature is non-positive. We will use greedy decoding for Exllama warnings.warn( /home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py:144: UserWarning: the eos_token '\n' is encoded into tensor([29871, 13]) with length != 1, using 13 as the eos_token_id warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '
MCTSAggregation: no answer retrieved.
fire.Fire(main)
File "/home/lzy/anaconda3/envs/agent/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/lzy/anaconda3/envs/agent/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/lzy/anaconda3/envs/agent/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(varargs, kwargs)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/inference.py", line 146, in main
rap_gsm8k(base_model=base_model,
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/inference.py", line 69, in rap_gsm8k
accuracy = evaluator.evaluate(reasoner, num_shot=4, resume=resume, log_dir=log_dir)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/base.py", line 232, in evaluate
algo_output = reasoner(self.input_processor(example),
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/base.py", line 183, in call
return self.search_algo(self.world_model, self.search_config, kwargs)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 314, in call
self.search()
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 284, in search
path = self.iterate(self.root)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 188, in iterate
self._simulate(path)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 249, in _simulate
self._expand(node)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 224, in _expand
node.state, aux = self.world_model.step(node.parent.state, node.action)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/world_model.py", line 96, in step
outputs = self.base_model.generate([model_input] num,
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py", line 163, in generate
decoded = self.generate_simple(self.generator, inputs[start:end], max_new_tokens=max_new_tokens,
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py", line 200, in generate_simple
generator.gen_begin(ids, mask=mask)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/generator.py", line 186, in gen_begin
self.model.forward(self.sequence[:, :-1], self.cache, preprocess_only = True, lora = self.lora, input_mask = mask)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 972, in forward
r = self._forward(input_ids[:, chunk_begin : chunk_end],
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 1058, in _forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 536, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 440, in forward
new_keys = cache.key_states[self.index].narrow(2, past_len, q_len).narrow(0, 0, bsz)
Case #1: correct=False, output=None, answer='18';accuracy=0.000 (0/1)
gsm8k: 0%| | 1/1319 [00:09<3:35:53, 9.83s/it] A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take? gsm8k: 0%| | 1/1319 [00:34<12:35:08, 34.38s/it] Traceback (most recent call last):
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/inference.py", line 155, in
RuntimeError: start (3072) + length (6) exceeds dimension size (3072).`