I have encountered two issues while implementing the TopoMLP codebase, and I am seeking assistance in resolving them.
Question 1: GPU Memory Usage Discrepancy
During model training, I noticed a considerable difference in GPU memory usage between the TopoMLP model and other similar-sized models like Stream Map Net. While the TopoMLP model at batch size 1 consumed approximately 42 GB of GPU memory, Stream Map Net only consumed around 7.5 GB at the same batch size. I'm curious about the reasons behind this disparity in GPU memory usage, particularly whether TopoMLP loads LiDAR data, which might contribute to the increased memory overhead.
Question 2: Convergence Discrepancy
When implementing the TopoMLP code with a modified learning rate (0.1 times the original learning rate), I observed a significant difference in convergence behavior compared to the expected results. Specifically, while the original logs indicated convergence to a loss of approximately 7, my implementation converged to a loss of around 11. Additionally, the lane detection results differed notably. I'm seeking insights into why such discrepancies in convergence occurred.
Our code uses hybrid matching to improve training speed, which requires more GPU memory usage (this is the main memory usage compared to StreamMapNet). You can remove hybrid matching and add flash attention to save memory usage.
The learning rate is well-tuned by ourselves. Other learning rate settings may be not optimal.
I have encountered two issues while implementing the TopoMLP codebase, and I am seeking assistance in resolving them.
Question 1: GPU Memory Usage Discrepancy
During model training, I noticed a considerable difference in GPU memory usage between the TopoMLP model and other similar-sized models like Stream Map Net. While the TopoMLP model at batch size 1 consumed approximately 42 GB of GPU memory, Stream Map Net only consumed around 7.5 GB at the same batch size. I'm curious about the reasons behind this disparity in GPU memory usage, particularly whether TopoMLP loads LiDAR data, which might contribute to the increased memory overhead.
Question 2: Convergence Discrepancy
When implementing the TopoMLP code with a modified learning rate (0.1 times the original learning rate), I observed a significant difference in convergence behavior compared to the expected results. Specifically, while the original logs indicated convergence to a loss of approximately 7, my implementation converged to a loss of around 11. Additionally, the lane detection results differed notably. I'm seeking insights into why such discrepancies in convergence occurred.