Open flyingGH opened 1 year ago
@flyingGH Thank you for you interest in the work!
I see that I didn't put training time into the paper, but I have added more details to my thesis. Specifically I discuss training under Section 6.4.2 and have included some plots of the training loss in the appendix (http://asrl.utias.utoronto.ca/~mgr/thesis/Gridseth_Mona_202211_PhD_thesis.pdf).
We started by training on the UTIAS Multiseason dataset before fine-tuning with the UTIAS In-The-Dark data. We trained on 391 × 10,000 = 3.91 million samples from the UTIAS Multiseason dataset, which took approximately 14.5 days, and another 194 × 10,000 = 1.94 million samples from UTIAS In-The-Dark, which took a bit over 6.5 days. I trained the network for a long time. I probably could have gotten away with less training time as seen by the loss curves in Appendix A of the thesis. I ran out of time before I could do work to find out how little training I could get away with and still get a comparable result.
Moreover, I did not get the time to experiment much with pretrained layers. I imagine using pretrained layers instead of training network completely from scratch might improve results and also reduce total training time needed. This would also require some exploration of possibly changing the size of some network layers to find ones that better fit pretrained models such as VGG16 or others. I had to find a balance between the size of the layers I used and the size of the resulting descriptors. One downside to the current network design is that the descriptor size depends on the layers. (Alternative approaches such as ASLFeat or R2D2 manage to limit the size of descriptors with their network design).
Because the training is done end-to-end with a full pose estimation pipeline, it requires a fair amount of space on the GPU and so I trained with as large a batch size I could fit on a single GPU. I did not attempt spreading the training across more than one GPU to speed up the training as this seemed a bit more complex in this case than for the case of training a network with a loss directly on the network output (i.e. without the additional pose estimation). The simplest possibilities for speeding up training is likely as you suggested to find the minimum number of samples needed for desired performance, and possibly exploring pretrained layers which may make the initial training faster/easier than when training from scratch. There may be ways of speeding up the code itself that I have not considered that more skilled programmers may be aware of.
I am really sorry for replying after so long time. It works by decreasing training epoch number depends on your thesis Section 6.4.2. Next I prepare to read your full thesis to have a more understanding about SLAM system & DL enhanced SLAM. Have your thought about to get 3D reconstruction depend on this work on the garden?
Thank you for sharing your great job. I would like to know the training time on your Tesla V100 platform. Is there any ways to speed up the training process? Such as, using the least number of training images needed to get the comparable result. Look forward to your reply. It will be quite helpful.