zugexiaodui / torch_flops

A library for calculating the FLOPs in the forward() process based on torch.fx
MIT License
87 stars 2 forks source link

memory profiler not work #5

Closed BitCalSaul closed 10 months ago

BitCalSaul commented 10 months ago

Hi, I installed the newest version from pip using this command pip install torch_flops -i https://pypi.org/simple. While the last version's torch_flops could be imported in the script but not for the newest version. With the correct conda environment activated, the python throw an error as below:

image

Thus, I git clone your repo and manually import the function. However, with the linear model tested, the memory profiler seems not working as below:

image

Considering the model and tensor are in the GPU, there should be some value instead of 0.

zugexiaodui commented 10 months ago

You can try pip install torch_flops -i https://pypi.org/simple --upgrade to install a new version. I will fix this import error in the next version.

As to the memory profiler, could you please share your code here?

BitCalSaul commented 10 months ago

Hi I tried this command but it still has this error:

image

For the memory profiler, I found if you put the model to torch.device('cuda:0'), it would be correct, while incorrect for other GPUs.

zugexiaodui commented 10 months ago

Sorry for the late update of the version. I just uploaded the new codes and fixed the import error. You can update the lib to v0.3.4 now.

The default GPU is 'cuda:0' and I didn't change it, so it only supports for 'cuda:0'. However, you can use 'CUDA_VISIBLE_DEVICES' to specify which GPU to use, and the specified GPU will be regarded as 'cuda:0'.

zugexiaodui commented 10 months ago

And the new version supports for peak memory, which is an option of TorchFLOPsByFX class.

zugexiaodui commented 10 months ago

The bug of "cuda!= 0" has been fixed in the version 0.3.5.

BitCalSaul commented 10 months ago

Thanks the package could be imported properly. And the time and max memory used are both printable.