Open zhuansmueoague opened 1 month ago
Hi @zhuansmueoague, thanks for opening this issue!
This is possibly a torch issue - calling model.to(device)
just calls nn.Module
's to method under the hood. To check if there's a memory leak - it'd be best to see how the memory used varies for each step in the for loop
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
latest
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
import torch from transformers import AutoModel import psutil process = psutil.Process()
CPM_PATH = "minicpm_v2" minicpm_model = AutoModel.from_pretrained(CPM_PATH, trust_remote_code=True, torch_dtype=torch.bfloat16) minicpm_model.to("cuda")
memory_info = process.memory_info() memory_usage_MB = memory_info.rss / (1024 ** 2) print(f"Memory usage: {memory_usage_MB}")
for i in range(10): minicpm_model.to("cpu") minicpm_model.to("cuda")
memory_info = process.memory_info() memory_usage_MB = memory_info.rss / (1024 ** 2) print(f"Memory usage: {memory_usage_MB}")
Expected behavior
Memory usage: 720.6875 Memory usage: 6311.3515625
The memory should be same since the model is both on GPU at the beginning and the end.