High memory usage during inference.

Thevakumar-Luheerathan commented 2 years ago

I am using ssdlite_mobilenetv2 pretrained model for my project. The size of the pretrained model is around 17MB. But when I load the model (with init_detector()), The memory consumption was much higher (More than 2GB - GPU + CPU memory). What is the reason for this unexpected behaviour? How can I reduce the memory consumption of model loading..

Note : I am using Jetson nano. Model is running in GPU.

Thank you.

pcicales commented 2 years ago

How many workers per gpu? Batch size? Also check if you are computing gradients (i.e. make sure you are in eval mode).

Thevakumar-Luheerathan commented 2 years ago

I just used the methods provided in mmdetection apis (inference_detector, init_detector). I am using a single image. Not a batch as such .I just traced through the code to find out where the memory hikes up. I observed that memory hikes during model loading for inference(When I use the init_detector function). In init_detector function model.to(device) consumes more cpu (1.6GB increment)and around 600MB of GPU memory.

I am completely new. Is this behavior normal? How can I reduce memory usage?

pcicales commented 2 years ago

Pytorch uses some memory for meta-data and CUDA context management, and CUDA code itself also uses a significant amount of cpu memory (has to do with unified memory I think). Can you please check the number of workers you have per gpu? You can set it to 1 or 0 and see how it impacts memory consumption. If it is more than 1, you are instantiating that many processes to load data in parallel, which means you are making copies in cpu memory.

Thevakumar-Luheerathan commented 2 years ago

Pytorch uses some memory for meta-data and CUDA context management, and CUDA code itself also uses a significant amount of cpu memory (has to do with unified memory I think). Can you please check the number of workers you have per gpu? You can set it to 1 or 0 and see how it impacts memory consumption. If it is more than 1, you are instantiating that many processes to load data in parallel, which means you are making copies in cpu memory.

Sorry for late reply. I used torch.utils.data.get_worker_info() to get the worker_process info. It is printed as None. Can I check the worker process with this code? Or How can I check the worker process? Thanks for the assistance.

pcicales commented 2 years ago

@Thevakumar-Luheerathan please check your config files, you will see a num_workers key in your dataset dicts.

figkim commented 2 years ago

Hi. I also investigate the memory usage of init_detector and inference_detector.

@Thevakumar-Luheerathan

Did you succeed in reducing memory?
How did you profile the cpu RAM and GPU memory? Used torch.profiler? psutil?

@pcicales

How can I find num_worker setting? I am using swinTransformer model from forked mmdet. Also I did not set num_workers and cannot find from dataset dicts in my config file.

dataset_type = "CocoDataset"
classes = ["box"]
data = dict(test=dict(type=dataset_type, classes=classes, pipeline=test_pipeline))

Is there any way to remove mmdet.apis dependency to inference a model based on mmdet configuration. In other words, Is there a way to import and inference checkpoint files without mmdet.apis.init_detector or mmdet.apis.inference_detector? Or are there a sample practices that I can refer to?

Thevakumar-Luheerathan commented 2 years ago

Hi. I also investigate the memory usage of init_detector and inference_detector.

@Thevakumar-Luheerathan

Did you succeed in reducing memory?

How did you profile the cpu RAM and GPU memory? Used torch.profiler? psutil?

@pcicales

How can I find num_worker setting? I am using swinTransformer model from forked mmdet. Also I did not set num_workers and cannot find from dataset dicts in my config file.
dataset_type = "CocoDataset"
classes = ["box"]
data = dict(test=dict(type=dataset_type, classes=classes, pipeline=test_pipeline))
Is there any way to remove mmdet.apis dependency to inference a model based on mmdet configuration. In other words, Is there a way to import and inference checkpoint files without mmdet.apis.init_detector or mmdet.apis.inference_detector? Or are there a sample practices that I can refer to?

No, I did not. As I remember I used torch.profiler or something.

nikky4D commented 2 years ago

Hi. I also investigate the memory usage of init_detector and inference_detector.

@Thevakumar-Luheerathan

1. Did you succeed in reducing memory?

2. How did you profile the cpu RAM and GPU memory? Used `torch.profiler`? `psutil`?

@pcicales

1. How can I find `num_worker` setting? I am using `swinTransformer` [model](https://github.com/SwinTransformer/Swin-Transformer-Object-Detection) from forked `mmdet`. Also I did not set `num_workers` and cannot find from dataset dicts in my config file.

dataset_type = "CocoDataset"
classes = ["box"]
data = dict(test=dict(type=dataset_type, classes=classes, pipeline=test_pipeline))

2. Is there any way to remove `mmdet.apis` dependency to inference a model based on mmdet configuration.
   In other words, Is there a way to import and inference checkpoint files without `mmdet.apis.init_detector` or `mmdet.apis.inference_detector`? Or are there a sample practices that I can refer to?

Did you get answers to these questions? I am running into similar issues and want to see how I can navigate the memory issues with mmdet

open-mmlab / mmdetection

High memory usage during inference. #7621