Open JamieVC opened 4 months ago
You may clear the model with del llm_model
.
Thanks for the good idea del llm_model
, but I have another question.
The create_model() is set @st.cache_resource
like source code below. In my understandings, the function create_model() just run once. After I delete the old model, I'd like to create a new model with create_model(). How do I make it rerun?
@st.cache_resource
def create_model(model_name):
llm_model = IpexLLM.from_model_id(
model_name=model_name,
tokenizer_name=tokenizer_name,
context_window=4096,
max_new_tokens=512,
load_in_low_bit='asym_int4',
completion_to_prompt=completion_to_prompt,
generate_kwargs={
"do_sample": True, 'temperature': 0.1,
"eos_token_id": [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]},
#messages_to_prompt=messages_to_prompt,
device_map='xpu',
)
You may use st.cache_resource.clear()
to rerun to create a new model as below:
model = create_model(name1)
del model
st.cache_resource.clear()
model = create_model(name2)
I hope to switch llama2-7b-chat and llama3-8b models. But it cost a lot of memory size if I load both. How to clear one if I am going to load the second model?
llm_model = IpexLLM.from_model_id( model_name=model_name, tokenizer_name=tokenizer_name, context_window=4096, max_new_tokens=512, load_in_low_bit='asym_int4', completion_to_prompt=completion_to_prompt, generate_kwargs={ "do_sample": True, 'temperature': 0.1, "eos_token_id": [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]},
messages_to_prompt=messages_to_prompt,