ywyue / AGILE3D

[ICLR 2024] AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
https://ywyue.github.io/AGILE3D/
MIT License
84 stars 5 forks source link

Backbone and Training details #3

Closed RyanG41 closed 4 months ago

RyanG41 commented 4 months ago

Hi, Thanks for the wonderful job and open-sourced codes!

I have some confusion about the details though:

  1. The paper mentioned the feature extraction backbone is based on the Minkowski Engine, which actually seems to be a mix of ME and another version of Resnet arc in Mask3D when checking the codes. What is the consideration behind this?
  2. The experimental settings said that the training is conducted on a single RTX TITAN with 24GB memory. Though I encountered a memory issue: "MemoryError: std::bad_alloc: cudaErrorMemoryAllocation: out of memory" when I run the training script using a single RTX4090 with same 24GB memory. Is there any adjustments on the released traning codes? or is it just something wrong with my server.

Thanks for the possible help in any kind!

ywyue commented 4 months ago

Hi, thanks for your interest in our work!

  1. Minkowski Engine was originally proposed in 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. You can find the ResNet arc in their code: https://github.com/chrischoy/SpatioTemporalSegmentation/tree/master/models

  2. We indeed can train the model on a single RTX TITAN with 24GB memory. I didn't try training the model on RTX4090.

Let me know if you have further concerns!

RyanG41 commented 4 months ago

Hi, thanks for your interest in our work!

  1. Minkowski Engine was originally proposed in 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. You can find the ResNet arc in their code: https://github.com/chrischoy/SpatioTemporalSegmentation/tree/master/models
  2. We indeed can train the model on a single RTX TITAN with 24GB memory. I didn't try training the model on RTX4090.

Let me know if you have further concerns!

Hi, Thanks for the reply.

For the first question, what I actually intended to ask is that why do you finally choose to use the resnet arc similar to Mask3D instead of the arc in "4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks"? Since it seems that in InterObject3D they use the arc in Minkowski Engine.

For the second one, it is probably a issue with my server. The code now runs normally after several tries. May I also ask how long will the training process be on your side? It has been two days of the training and it merely reaches half of the 1100 epochs. And also why to choose the number "1100"? Is it empirical or it is only because you find that exprimentally 1100 epoch's training would give a best performance?

Thanks again for the wonderful job, it really helps.

ywyue commented 4 months ago

Hi,

  1. "4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks" provides several arcs, e.g. Res16UNet34A, Res16UNet34B, Res16UNet34C. As explained in our paper (see Model setting in A.2 IMPLEMENTATION DETAILS on page 13), we use the same Minkowski Res16UNet34C (Choy et al., 2019) backbone as InterObject3D and Mask3D. You may notice the arc code is slightly different, but the name/number of backbone parameters are exactly the same.

  2. Training for 1100 epochs on a single RTX TITAN with 24GB memory would take ~5-6 days. If you want to speed up the training, I would suggest to use higher-end GPUs or multi-gpu training. Optimal number of epochs is experimentally found to make the model converge while preventing overfitting.

RyanG41 commented 4 months ago

Hi,

Thanks for the timed reply! This clears my thoughts.