Closed Minami-su closed 8 months ago
Hi! Could you provide more information, such as your configuration and which dataset you were evaluating when the out of memory issue occurred?
Hi! Could you provide more information, such as your configuration and which dataset you were evaluating when the out of memory issue occurred?
config.json
model:
type: inf-llm
path: Qwen1.5-0.5B-Chat
block_size: 128
n_init: 128
n_local: 4096
topk: 16
repr_topk: 4
max_cached_block: 32
exc_block_size: 512
score_decay: 0.1
fattn: true
base: 1000000
distance_scale: 1.0
max_len: 2147483647
chunk_size: 8192
conv_type: qwen
bash scripts/longbench.sh
mkdir: cannot create directory ‘benchmark/longbench-result’: File exists
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Pred narrativeqa
0%| | 0/200 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0%| | 0/200 [00:05<?, ?it/s]
Traceback (most recent call last):
File "/home/luhao/InfLLM-main2/benchmark/pred.py", line 299, in <module>
preds = get_pred(
File "/home/luhao/InfLLM-main2/benchmark/pred.py", line 241, in get_pred
output = searcher.generate(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/utils/greedy_search.py", line 33, in generate
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/utils/greedy_search.py", line 55, in _decode
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1173, in forward
outputs = self.model(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/utils/patch.py", line 98, in model_forward
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 773, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/utils/patch.py", line 16, in hf_forward
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/attention/inf_llm.py", line 60, in forward
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/attention/context_manager.py", line 726, in append
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/attention/context_manager.py", line 616, in append_global
File "/root/anaconda3/envs/train/lib/python3.9/site-packages/inf_llm-0.0.1-py3.9.egg/inf_llm/attention/context_manager.py", line 20, in __init__
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
/root/anaconda3/envs/train/lib/python3.9/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
I tested narrative qa on an A40 (48G) using your settings and limited the CUDA memory usage to 24G. No out of memory issue occurs. Did you use the model Qwen/Qwen1.5-0.5B-Chat from the Hugging Face Hub?
I want to know the GPU memory usage,because I'm out of memory when I test the benchmark, model: Qwen1.5 0.5B Chat GPU: RTX3090 command:bash scripts/infintebench.sh bash scripts/longbench.sh result:cuda out of memory.