hoiliu-0801 / DNTR

A DeNoising FPN with Transformer R-CNN for Tiny Object Detection
Apache License 2.0
20 stars 1 forks source link

time cost during training period #9

Open Quickcheck opened 1 week ago

Quickcheck commented 1 week ago

I’m currently using DNTR project for training my own dataset; but there is something abnormal during training period. My own dataset is about 40000 images, and most of these pictures are smaller than 400400 in size. The number of classes is 2. The gpu is rtx 3090, and the config file is aitod_DNTR_MASK.PY where the input size is set to 256192. the samples_per_gpu and workers_per_gpu params are set to 4 and 8,respectively. The abnormal part is that the training time cost per epoch turns to be about 4 hours. Could you provide some suggestions on which part (during training pipeline) could cause such huge time cost? Or, are there any parameters that are set incorrectly or not set?

looking forward for your reply :)

Quickcheck commented 6 days ago

hi,author. After calculating the time consumption of each code moule during the training pipeline,I find that the mainly time cost belongs to the tensor.cpu().numpy() operation in Token_pair_block class, which in my case spend about 1 seconds in total. I tried to convert tensor "index" and "token_compare" to gpu, but the time consumption of function gpu_repair reached to 8 senconds, while it used to be 0.01 senconds. is this phenomenon also occurs in your training period?And I'm trying to optimize the time consumption of gpu_repair function. could you give some advice?it is a bit difficult for me to understand the purpose of this code moudle due to the lack of relevant textual explanations.

hoiliu-0801 commented 6 days ago

For the detailed of the algorithm, please refer to Section 4-B (Task Token Selection). The pseudocode can be found in the "gpu_pair" function. I have already optimized the speed using the "gpu_pair" and "torch.gather" functions, but feel free to explore further optimizations for even faster performance.

Quickcheck commented 5 days ago

thanks for your reply. The before mentioned time cost of tensor.cpu().numpy() operation seemed to be not that large when the training code runs in our lab's other computer . In that device, the mainly time cost comes from TwoStateDetector.the extract_feat func cost 0.08s, the constractive loss cost 0.15s, the backbone func cost 0.03s,and roi_head cost 0.6s in my case. it seems that the reuslt x of first extract_feat func is unused. and you mentioned that geo and sem loss can be ignored in pretrained period. so i guess there's at least 0.26s time cost can be decresed(0.08+0.03+0.15) in pretrained period.

hoiliu-0801 commented 1 day ago

To implement our ideas in DN-FPN, we utilized multiple for-loops for geo and sem losses, which may result in increased operation time. Thank you for sharing your results.