offload to CPU will increase memory

zhuansmueoague commented 1 month ago

System Info

latest

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

import torch from transformers import AutoModel import psutil process = psutil.Process()

CPM_PATH = "minicpm_v2" minicpm_model = AutoModel.from_pretrained(CPM_PATH, trust_remote_code=True, torch_dtype=torch.bfloat16) minicpm_model.to("cuda")

memory_info = process.memory_info() memory_usage_MB = memory_info.rss / (1024 ** 2) print(f"Memory usage: {memory_usage_MB}")

for i in range(10): minicpm_model.to("cpu") minicpm_model.to("cuda")

memory_info = process.memory_info() memory_usage_MB = memory_info.rss / (1024 ** 2) print(f"Memory usage: {memory_usage_MB}")

Expected behavior

Memory usage: 720.6875 Memory usage: 6311.3515625

The memory should be same since the model is both on GPU at the beginning and the end.

amyeroberts commented 1 month ago

Hi @zhuansmueoague, thanks for opening this issue!

This is possibly a torch issue - calling model.to(device) just calls nn.Module's to method under the hood. To check if there's a memory leak - it'd be best to see how the memory used varies for each step in the for loop

github-actions[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers