Open irjawais opened 4 months ago
warnings.warn(
Loading checkpoint shards: 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████ | 3/4 [01:53<00:37, 37.72s/it]Traceback (most recent call last):
File "
@irjawais Can you check the memory usage when converting the model? From your description, it seems that there may be insufficient memory.
When I load "meta-llama/Meta-Llama-3-8B-Instruct" model like this
from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # Hugging Face model_id or local model tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) streamer = TextStreamer(tokenizer) model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
it got hanged. Then only way is to restart instance to recover it.
Is there any issue in my spec?
my instance spec ubunu 32 GB RAM.