tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
https://arxiv.org/abs/2210.07105
Apache License 2.0
1.06k stars 124 forks source link

The issue of LB-SAC GPU memory usage #57

Closed WuTi0525 closed 1 year ago

WuTi0525 commented 1 year ago

I ran the hopper expert task, which used 50 critic networks. I checked the GPU memory occupied by the process on the RTX3090 GPU (using the nvidia smi command), which was approximately 3.7GB. However, the paper reported 5.4GB, which is a significant difference. I would like to know the reason for this, or if there is an error in my way of checking GPU memory, or if there is an issue with the GPU model.Looking forward to your early reply!

DT6A commented 1 year ago

Hello, I think the reason might be the different GPU (we used A100 if I'm not wrong). Also CUDA and Torch version might have affects on memory usage.

WuTi0525 commented 1 year ago

That is to say, are you also using the nvidia smi command to view the GPU memory occupied by the process? Indeed, the same program has different GPU memory on different GPUs. I have found a pattern that may be: GPUs with higher computing power also occupy more GPU memory, which is currently the case on our GPU devices. Additionally, your A100 also follows this pattern.

DT6A commented 1 year ago

I think we used wandb for memory usage tracking. @Howuhh did we or we also used nvidia-smi?

Howuhh commented 1 year ago

Yeah, I used this https://pypi.org/project/nvidia-smi/ package, which should use nvidia-smi under the hood

WuTi0525 commented 1 year ago

So the reason why my program's GPU memory is different from what is reported in the paper is probably because of the use of different GPUs, and perhaps CUDA and torch version.

Howuhh commented 1 year ago

Yup, I think so

WuTi0525 commented 1 year ago

Thanks!

------------------ 原始邮件 ------------------ 发件人: "tinkoff-ai/CORL" @.>; 发送时间: 2023年6月13日(星期二) 晚上7:13 @.>; @.**@.>; 主题: Re: [tinkoff-ai/CORL] The issue of LB-SAC GPU memory usage (Issue #57)

Yup, I think so

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>