Is it possible to reduce the inference time?

aatefi2 commented 11 months ago

Hi shangjie-li,

I appreciate your help to apply your code for object detection on my lidar data. I am able to successfully train the model and get the predictions on my train and test dataset. I need to run test.py for real-time object detection. When I run "python test.py --ckpt=output/ckpt/checkpoint_epoch_150.pth --save_to_file", it takes around 9s to complete the prediction process (and save results) on a pointcloud file.

Screenshot from 2023-11-27 16-15-01

I am running the code on Ubuntu 22.04 with NVIDIA Quadro P2000 mobile (GP107 GLM). When I run the test.py code, the CPU and GPU usage is about %100 and %7, respectively. Any idea to reduce the inference time?

Thank you, Abbas

shangjie-li commented 11 months ago

Glad to hear from you again.

I'm not sure, is the NVIDIA Quadro P2000 mobile a graphics card with similar performance to the NVIDIA GeForce GTX 1050 Ti Laptop? I have used the NVIDIA GeForce GTX 1080 Ti to run this network, and the network inference time is only 10 to 20ms. My method of calculating inference time is like this:

1.1 Data loading, post-processing, and storage of results are not included.

1.2 The timing rules are as follows (just the time it takes the network to process one input tensor):

t0 = time.time()
pred_dicts, _ = model.forward(data_dict)
time_cost = time.time() - t0

Data loading, post-processing, and storage of results consume CPU. Only the process of network inference consumes GPU. I think most of the 9s you mentioned was spent on data loading. So the CPU usage is very high, and the GPU usage is very low.
It should also be noted that the GPU takes a long time (a few seconds) to respond the first frame for network inference. But in the next few frames, the inference speed will behave normally. Therefore, when calculating the network inference time, the widely used method is to skip the first frame data and take the average time of the following frames.
You can try python demo.py --ckpt=output/ckpt/checkpoint_epoch_150.pth --compute_latency. Under this command, for one frame of data, the network will inference 11 times. The inference time of the first frame will be ignored, and the average inference time of the subsequent 10 frames will be calculated. You can check out the result. My guess is within 100ms.

aatefi2 commented 11 months ago

shangjie-li,

Thank you for your suggestion. I cut and pasted all libraries from test.py to a python code and then call test.py through that python code (exec(open('test.py').read())). I am able to get predictions and also save the information of all predicted objects in a .txt file. The process takes around 0.4-0.5 s.

aatefi2 commented 11 months ago

Hi shangjie-li,

1) As you mentioned, Data loading, post-processing, and storage of results consume CPU. I looked at the codes, and I found that all booleans (dist_test) for using cpu (to_cpu) are set to False in test.py and pointpillar.py. It seems GPU is used for loading data. Am I right? I also assigned all map_location=torch.device('cuda') in pointpillar.py to use GPU. Do I need to change this part since to_cpu=False?

2) For demo.py, if I use to_cpu=False (change True to False), I assume the demo will be based on GPU. Am I right?

Thank you, Abbas

shangjie-li commented 11 months ago

The method for check the device where the data or the model is located is: print(data_dict.device) or print(model.device). If the output is cpu, the data or the model is loaded only to CPU. If the output is cuda:0, the data or the model is loaded to the GPU.
In my code, the data and the model are processed on the GPU. You can find model.cuda() in eval_single_ckpt() of test.py, which is to load the model to the GPU. The process of loading data to the GPU is in load_data_to_gpu() of pointpillar.py.
Judging from 1 and 2, it is also correct to say that "GPU is used for loading data". The step .cuda() loads data to the GPU. It is more accurate to say that both the CPU and GPU are involved in the data loading process. Because in my code, the point cloud data is read using np.fromfile first, this step is done by the CPU. And then, torch.from_numpy(val).float().cuda() is done by the GPU.
dist_test means distributed test, and this parameter is related to distributed training or testing. Because I only used one GPU for training and testing. So I set dist_test to False for all of my code. In fact, much of my code is inherited from other repositories, and dist_test is an implementation of other repositories. I've kept this interface, but I've never tried to set dist_test to True.
From what I understand, map_location=torch.device('cuda') is equivalent to map_location=torch.device('cpu') and then model.cuda(). Eventually, either way, the model will be loaded to the GPU.
For demo.py, I used to_cpu=True first and then model.cuda(), so the model still ends up loading on the GPU. I don't think your changes will make a difference.

aatefi2 commented 11 months ago

shangjie-li,

Thank you for your suggestions. It seems all point cloud data should be associated with a non-empty label (.txt file) for training process. I tried to use an empty label .txt file, but I got an error since the model is looking for the object name and its location, size, and rotation. I also can not train the model using a fake label with a random class name(s) and set all object values to zero. Because, the background objects (Not TPs) have a huge variety from a single point to a big cluster objects in my dataset. Accordingly, I can not name all of them as one class or even different classes. I think the deep model will not be trained properly for the background objects.

Is it possible to feed some point cloud files (10% of the dataset) just includes background objects (random objects without the targeted objects (TP)) for the training process? If yes, how can I do that?

shangjie-li commented 10 months ago

It seems all point cloud data should be associated with a non-empty label (*.txt file) for training process.

That's right.

I think the deep model will not be trained properly for the background objects.

I don't think so. For a frame of lidar point cloud (containing several targets), the points of the targets are positive samples (TP), and all other points are negative samples (FP). I think the network can learn to tell which points belong in the background.

aatefi2 commented 10 months ago

shangjie-li,

Thank you!

shangjie-li / pointpillars

Is it possible to reduce the inference time? #3