Open javierp183 opened 1 year ago
I get the same 'Killed' message when I run Single GPU inference without quantization on Linux:
python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
I get the same 'Killed' message when I run Single GPU inference without quantization on Linux:
python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
Hi, I think the problem is the amount of memory usage if you don't apply the quantization of the model.
I got the same while doing:
python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH --model_size 65B --output_dir ./converted_meta_hf_65 --to hf --max_batch_size 4
[1] 16261 killed python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path 65B
Ok, seems like I figured it out. It tries to load the whole model into RAM. I currently have 64Gb (62.7). Had to allocate 70Gb of swap to make it work. 🤣
Also note that you should have enough disk space to do conversion.
Ok, seems like I figured it out. It tries to load the whole model into RAM. I currently have 64Gb (62.7). Had to allocate 70Gb of swap to make it work. 🤣
Also note that you should have enough disk space to do conversion.
Could you tell ... How. ?thanks
Hello all, I installed the requirements of project but when I try to execute the following command:
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
I got this message -> "Killed". Could you help me to determinate better the issue and fix. thanks