Open ganghe opened 2 weeks ago
test.py.txt The demo python code
It's the right behavior of Python virtual machine, we can't force Python VM releasing it's memory on CPU. After you del the model, the memory is empty in VM. If you load a new model, python process won't apply new memory.
Hi Qiuxin,
Based on my observations, if you do this step(load model+model.generate+del model) for multiple times(in the same process), the process' vm usage will become huge, then system oom will kill this process. Maybe you guys can try to reproduce this case, to see if we can improve this situation, or not.
Thanks Gang
I can't reproduce after 20 times, on current nightly 2.1.0b20240701+ oneapi 2024.0 + intel-extension-for-pytorch 2.1.10+xpu
Hi team,
I want to release the related memory via del model variable after model generate, but it does not work as my expectation. The demo code is as below,
import torch import time import numpy as np
import intel_extension_for_pytorch as ipex
from ipex_llm.transformers import AutoModelForCausalLM from transformers import AutoTokenizer
model_path = "./baichuan2_model/baichuan-inc/Baichuan2-7B-Chat"
Load model in 4 bit,
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True) model = model.half().to('xpu')
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
prompt = "北京有哪些景点?" input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
ipex_llm model needs a warmup, then inference time can be accurate
output = model.generate(input_ids,max_new_tokens=32) torch.xpu.synchronize() output = output.cpu() output_str = tokenizer.decode(output[0], skip_special_tokens=True) print(output_str)
input("please input enter to del model:") model.to('cpu') torch.xpu.synchronize() torch.xpu.empty_cache() del model import gc gc.collect() input("please input enter to exit:")
I can see the memory usage for the python process is still here before the process exists. my environment is, Linux Ubuntu 22.04 oneapi 24.01 ipex-llm 2.1.0b20240610