huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.16k stars 26.33k forks source link

offload to CPU will increase memory #32654

Open zhuansmueoague opened 1 month ago

zhuansmueoague commented 1 month ago

System Info

latest

Who can help?

No response

Information

Tasks

Reproduction

import torch from transformers import AutoModel import psutil process = psutil.Process()

CPM_PATH = "minicpm_v2" minicpm_model = AutoModel.from_pretrained(CPM_PATH, trust_remote_code=True, torch_dtype=torch.bfloat16) minicpm_model.to("cuda")

memory_info = process.memory_info() memory_usage_MB = memory_info.rss / (1024 ** 2) print(f"Memory usage: {memory_usage_MB}")

for i in range(10): minicpm_model.to("cpu") minicpm_model.to("cuda")

memory_info = process.memory_info() memory_usage_MB = memory_info.rss / (1024 ** 2) print(f"Memory usage: {memory_usage_MB}")

Expected behavior

Memory usage: 720.6875 Memory usage: 6311.3515625

The memory should be same since the model is both on GPU at the beginning and the end.

amyeroberts commented 1 month ago

Hi @zhuansmueoague, thanks for opening this issue!

This is possibly a torch issue - calling model.to(device) just calls nn.Module's to method under the hood. To check if there's a memory leak - it'd be best to see how the memory used varies for each step in the for loop

github-actions[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.