isl-org / DPT

Dense Prediction Transformers
MIT License
2.01k stars 258 forks source link

Thanks and is there a plan to release the training code? #3

Open jucic opened 3 years ago

lxtGH commented 3 years ago

I have the same question. Will you release the training code for reference?

ranftlr commented 3 years ago

We plan to release the training code. I can't give an exact timeline at the moment, but I hope that we'll be able to do this within one or two months.

Tord-Zhang commented 3 years ago

@ranftlr Wonderful work! Also looking forward to the training code. I'd like to ask how many gpus did you use for training and how long did it take to train the model?

ranftlr commented 3 years ago

@Tord-Zhang: We typically train on 4 Quadro 6000 cards that have 24 GB memory each. A complete run to produce the final model takes about 5 days to complete.

@angrysword: Sorry, I don't understand your question. Can you elaborate more on the problem that you are observing?

Tord-Zhang commented 3 years ago

@ranftlr Hi, thanks for your quick response. I am a little surprised about the training speed, since the MGDA training algorithm is used, in which a minibatch from each dataset need two forward calculation in each iteration, and the dataset is also very big. Would Quadro 6000 be faster than Tesla V100? BTW, could I ask when the training code would be released? Thanks!

ranftlr commented 3 years ago

We don't go through all the images in every "epoch". Since the sizes of individual datasets can differ by an order of magnitude, we use a resampling strategy that assembles mini-batches in equal parts (on average) from every datasets. This also plays well with the diversity of the individual datasets: the large datasets typically have a lot of similar frames as the frames come from videos, whereas the smaller datasets tend to have a lot of uncorrelated images. Based on this, we perform a fixed number of total steps. We define an "epoch" as seeing 72000 samples and train for 2x 60 epochs - once for pre-training, once for the run over the full dataset. Please have a look at the MiDaS paper for more details (https://arxiv.org/abs/1907.01341).

I don't have a comparison to a V100, as I don't have any available.

There is still not exact ETA for training code.

Tord-Zhang commented 3 years ago

@ranftlr Did you mean sampling equal numer of images from different dataset in each batch? or sample different number of images but the same percentage? About the training, did you use DP or DDP when training with MGDA algorithm? I am not sure does MGDA support DDP.

Tord-Zhang commented 3 years ago

@ranftlr And which version of Blended MVS did you use? BlendedMVS, BlendedMVS+ or BlendedMVS++? High resolution or low resolution? I found that there are some unpleasant noise point in the groundtruth of low resolution blendedmvs, how did you deal with those points? Thanks.

eliabruni commented 3 years ago

Hello @ranftlr , still no ETA for the training code? We're considering writing our own, but that'd be undesirable especially if you are planning to release your original code.

soroushseifi commented 3 years ago

Hi @ranftlr, thanks for your impressive work, I would also like to mention that we at KU Leuven's PSI lab are looking forward to the training code, otherwise we will need to write our own code too, so if you could give an indication of the time you'd release the code, we will have a better idea how to organize our work.

Tord-Zhang commented 3 years ago

@ranftlr Hi, still no ETA for training code?

chrisdottel commented 3 years ago

@eliabruni @soroushseifi @Tord-Zhang

Any luck writing the training code? I am not sure how laborious it would be but I am thinking about writing it. If no one has tried to do it, does anyone want to help? I want to try and fine-tune the model on NYU dataset but edges only and see how well this thing can estimate depth on line images.

eliabruni commented 3 years ago

@chrisdottel maybe you can have a look at https://github.com/open-mmlab/mmsegmentation/tree/master/configs/dpt (there is a training schedule in dpt_vit-b16_512x512_160k_ade20k.py)

yassineAlouini commented 2 years ago

Did someone manage to do transfer learning using the DPT large or hybride pretrained models and using another depth dataset? Is the loss described in this paper a good one to use for this transfer learning?

vns_transfer_learning_loss

Is unfreezing the last layer enough or should we unfreeze more?

Any pointers/tips are welcome, thanks in advance. :slightly_smiling_face:

antocad commented 2 years ago

Hi, we have re-implemented the paper presented in this repository, and we have added a training script. Check it out here: https://github.com/antocad/FocusOnDepth

yassineAlouini commented 2 years ago

@antocad Thanks for this. :ok_hand:

Tord-Zhang commented 2 years ago

So no plan for releasing training code?