Open aatefi2 opened 11 months ago
Glad to hear from you again.
1.1 Data loading, post-processing, and storage of results are not included.
1.2 The timing rules are as follows (just the time it takes the network to process one input tensor):
t0 = time.time()
pred_dicts, _ = model.forward(data_dict)
time_cost = time.time() - t0
Data loading, post-processing, and storage of results consume CPU. Only the process of network inference consumes GPU. I think most of the 9s you mentioned was spent on data loading. So the CPU usage is very high, and the GPU usage is very low.
It should also be noted that the GPU takes a long time (a few seconds) to respond the first frame for network inference. But in the next few frames, the inference speed will behave normally. Therefore, when calculating the network inference time, the widely used method is to skip the first frame data and take the average time of the following frames.
You can try python demo.py --ckpt=output/ckpt/checkpoint_epoch_150.pth --compute_latency
. Under this command, for one frame of data, the network will inference 11 times. The inference time of the first frame will be ignored, and the average inference time of the subsequent 10 frames will be calculated. You can check out the result. My guess is within 100ms.
shangjie-li,
Thank you for your suggestion. I cut and pasted all libraries from test.py to a python code and then call test.py through that python code (exec(open('test.py').read())
). I am able to get predictions and also save the information of all predicted objects in a .txt file. The process takes around 0.4-0.5 s.
Hi shangjie-li,
1) As you mentioned, Data loading, post-processing, and storage of results consume CPU. I looked at the codes, and I found that all booleans (dist_test
) for using cpu (to_cpu
) are set to False in test.py
and pointpillar.py
. It seems GPU is used for loading data. Am I right?
I also assigned all map_location=torch.device('cuda')
in pointpillar.py
to use GPU. Do I need to change this part since to_cpu=False
?
2) For demo.py
, if I use to_cpu=False
(change True to False), I assume the demo will be based on GPU. Am I right?
Thank you, Abbas
The method for check the device where the data or the model is located is: print(data_dict.device)
or print(model.device)
. If the output is cpu
, the data or the model is loaded only to CPU. If the output is cuda:0
, the data or the model is loaded to the GPU.
In my code, the data and the model are processed on the GPU. You can find model.cuda()
in eval_single_ckpt()
of test.py
, which is to load the model to the GPU. The process of loading data to the GPU is in load_data_to_gpu()
of pointpillar.py
.
Judging from 1 and 2, it is also correct to say that "GPU is used for loading data". The step .cuda()
loads data to the GPU. It is more accurate to say that both the CPU and GPU are involved in the data loading process. Because in my code, the point cloud data is read using np.fromfile
first, this step is done by the CPU. And then, torch.from_numpy(val).float().cuda()
is done by the GPU.
dist_test
means distributed test, and this parameter is related to distributed training or testing. Because I only used one GPU for training and testing. So I set dist_test
to False
for all of my code. In fact, much of my code is inherited from other repositories, and dist_test
is an implementation of other repositories. I've kept this interface, but I've never tried to set dist_test
to True
.
From what I understand, map_location=torch.device('cuda')
is equivalent to map_location=torch.device('cpu')
and then model.cuda()
. Eventually, either way, the model will be loaded to the GPU.
For demo.py
, I used to_cpu=True
first and then model.cuda()
, so the model still ends up loading on the GPU. I don't think your changes will make a difference.
shangjie-li,
Thank you for your suggestions. It seems all point cloud data should be associated with a non-empty label (.txt file) for training process. I tried to use an empty label .txt file, but I got an error since the model is looking for the object name and its location, size, and rotation. I also can not train the model using a fake label with a random class name(s) and set all object values to zero. Because, the background objects (Not TPs) have a huge variety from a single point to a big cluster objects in my dataset. Accordingly, I can not name all of them as one class or even different classes. I think the deep model will not be trained properly for the background objects.
Is it possible to feed some point cloud files (10% of the dataset) just includes background objects (random objects without the targeted objects (TP)) for the training process? If yes, how can I do that?
It seems all point cloud data should be associated with a non-empty label (*.txt file) for training process.
- That's right.
I think the deep model will not be trained properly for the background objects.
- I don't think so. For a frame of lidar point cloud (containing several targets), the points of the targets are positive samples (TP), and all other points are negative samples (FP). I think the network can learn to tell which points belong in the background.
shangjie-li,
Thank you!
Hi shangjie-li,
I appreciate your help to apply your code for object detection on my lidar data. I am able to successfully train the model and get the predictions on my train and test dataset. I need to run test.py for real-time object detection. When I run "python test.py --ckpt=output/ckpt/checkpoint_epoch_150.pth --save_to_file", it takes around 9s to complete the prediction process (and save results) on a pointcloud file.
I am running the code on Ubuntu 22.04 with NVIDIA Quadro P2000 mobile (GP107 GLM). When I run the test.py code, the CPU and GPU usage is about %100 and %7, respectively. Any idea to reduce the inference time?
Thank you, Abbas