Great Work!! But also some question about your implementation ?

zjjMaiMai / TinyHITNet

HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching

152 stars 21 forks source link

Great Work!! But also some question about your implementation ? #3

Closed Magicboomliu closed 2 years ago

Magicboomliu commented 2 years ago

Firstly, Thanks for your great work.

I Check the code. And find that you really follow the orginal paper about the initailization and propagation. Here comes my inqures:

Seems That you own do tile initialzation at the highest resolution and do propagation. In the original paper, the author do tile initialization at all 1/4-1/2 resolution. So that in the propagation update stage, you only have 1(n=1) tile hyp at each location, the original paper should have two(one from initialization at current scale, one from lowwer scale upsampled).
I am not sure, the original paper finally use three propagation module to recover the resolution, If I am right, in your code, you use the Refinement Module from stereonet , A simple dilation resblocks instead of Propagation Module to refine and increase the scale. Why you do not use the propagation manner? Did I hard to train or something else?

Finally, Thank you again for your great work. It will be very nice if you can answer my requires. Solute!

zjjMaiMai commented 2 years ago

Hi, @Magicboomliu

Seems That you own do tile initialzation at the highest resolution and do propagation. In the original paper, the author do tile initialization at all 1/4-1/2 resolution. So that in the propagation update stage, you only have 1(n=1) tile hyp at each location, the original paper should have two(one from initialization at current scale, one from lowwer scale upsampled).

I am not sure, the original paper finally use three propagation module to recover the resolution, If I am right, in your code, you use the Refinement Module from stereonet , A simple dilation resblocks instead of Propagation Module to refine and increase the scale. Why you do not use the propagation manner? Did I hard to train or something else?

please check supplementary material 3. Model Architecture Details for more details. BTW, our implementation has exactly the same network structure as the original implementation. so we can convert original HITNet-XL weights to pytorch and achieve 0.3762 EPE on sceneflow. we also trained HITNet from scratch on sceneflow and achieve 0.5486 EPE.

Magicboomliu commented 2 years ago

Thank you very much again for providing the supplementary material and carefull instructions!! I will check it out. It is really helpful!
BTW, in sceneflow training phase, how many epochs you train to achieve 0.5486EPE ? Because, I am little concerned about, if the epoch nums is big, maybe overfit at Kitti2012@2015. Other SOTA, like CSPN,PWCNet training with batch_size =8 smaller than 70 epochs.

zjjMaiMai commented 2 years ago

Thank you very much again for providing the supplementary material and carefull instructions!! I will check it out. It is really helpful! BTW, in sceneflow training phase, how many epochs you train to achieve 0.5486EPE ? Because, I am little concerned about, if the epoch nums is big, maybe overfit at Kitti2012@2015. Other SOTA, like CSPN,PWCNet training with batch_size =8 smaller than 70 epochs.

We follow the training configuration described in the supplementary material 1.1 Training Setup:

We trained for 1.42M iterations using the Adam optimizer, starting from a learning rate of 4e-4, dropping it to 1e-4, then to 4e-5, then to 1e-5 after 1M, 1.3M, 1.4M iterations respectively.

Magicboomliu commented 2 years ago

OK, great!. Thank you . So according to the file and you training script in this repo. I do a little calculation. 1.42M iteration, batch_size =8, Since Sceneflow is 35K samples. so the epoch nums is 1420000/(35400/8) ~~=320 epochs. From my experience, using 2 NVIDIA 2080Ti with batch size=8, 1epoch takes almost 1 hour . So 320 epochs means it takes almost 10 days to train? AM I correct or get some wrong? lol

zjjMaiMai commented 2 years ago

OK, great!. Thank you . So according to the file and you training script in this repo. I do a little calculation. 1.42M iteration, batch_size =8, Since Sceneflow is 35K samples. so the epoch nums is 1420000/(35400/8) ~~=320 epochs. From my experience, using 2 NVIDIA 2080Ti with batch size=8, 1epoch takes almost 1 hour . So 320 epochs means it takes almost 10 days to train? AM I correct or get some wrong? lol

Yes. we use 8 V100 GPUs. LOL

Magicboomliu commented 2 years ago

haha, OK, Fine. I got it. 👍 Thank your very much for your kindness instructions.
BTW, it is a great work, I am considering HItNet for a long time for research purpose, but the original repo only has the test model , other implementation just cannot work well. Your repo is really a great work.

zjjMaiMai commented 2 years ago

Feel free to reopen if any question!