naver / dust3r

DUSt3R: Geometric 3D Vision Made Easy
https://dust3r.europe.naverlabs.com/
Other
4.96k stars 542 forks source link

How to train from scratch or finetuning? #144

Open VillardX opened 1 month ago

VillardX commented 1 month ago

Thank you for your great work!

In my situations, I want to train dust3r from scratch or finetune it on my own data. You provide the Co3d preprocessed data demo. However I still have some questions.

  1. About the depths, is the depth png in milimeters?
  2. About the camera parameters(K R T) in ".npz" file. In the "camera_pose.npy", is the R matrix in W2C or in C2W? And is the T vector's unit is the same as "maximum_depth.npy", which means both in milimeters?
  3. About the amount of the data. How many train data image pairs can make the training/finetuning realll works? When I train/finetune it on my own data, the loss becomes smaller than 0. And the effect is far from the raw pretrained model you provided. Is it reasonable?

Hope for help, thanks.

yocabon commented 1 month ago

Hi,

  1. in co3d; I think the depth is not metric. For training dust3r, the scale of the depth doesn't matter (it matters for training mast3r, https://github.com/naver/mast3r/blob/main/mast3r/datasets/__init__.py).
  2. dust3r uses C2W, and it should be in the same scale as the depth yes.
  3. you can see some training details in #140. That being said we didn't training using the publicly released code so there could be some bugs. Having a loss < 0 is expected (the confidence loss is causing that). The co3d demo is not meant to produce good models, it was added so that people you could make sure the training code runs. 1000 pairs really isn't enough. Our first ablations were run on a dataset with 10_000 pairs but we quickly increased that number to what you see in there
VillardX commented 1 month ago

Thanks for your reply.

I think the preprocess of my own data is the same as you say in 1. and 2. However, when I train it from scratch or finetune based on "DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth" you provide, the result in gradio gets even worse. As you say in 3. , could I assess that the low quality of finetuning is due to the bugs of publicly released code. But the gradio demo.py works correctly, which means there is no bugs in the inference step?

I train/finetune on a single 4090 card, the num of images pairs in my trainset is about 130000.

For training from scrath, I use batchsize=8, the Scale_ShiftInv_pts3d_med goes really strange.

image image image

If this is not due to the train bugs you mentioned in the reply. Hope for some suggestions.

puyiwen commented 1 month ago

@VillardX Hi, when I train only use arkitscences data and evaluate on Co3d data,the Regr3D_ScaleShiftInv_pts3d_2_med is strange. Have you solved the problem?

yocabon commented 1 month ago

train only use arkitscences data and evaluate on Co3d data

Arkitscenes and Co3d are two very different datasets (indoor rooms vs object centric). The fact that a model trained on Arkitscenes doesn't perform well on Co3d doesn't seem too surprising.

On my end, I launched a small training experiments (224 linear on ARKitScenes, StaticThings3D, ScanNetpp) using the released code and the loss/validation on ARKitScenes are going down nicely.

JayKarhade commented 1 month ago

@yocabon i also tried the training script by using the entire Megadepth dataset with a larger batch size of 128 and for more epochs.

While the loss goes down, the results seem to get progressively worse qualitatively, even for training set data.

JayKarhade commented 1 month ago

4200622249_0d2a0511ec_o jpg 7108185213_c7c7e7a899_o jpg

For those 2 images, the default pre-trained dust3r output is (which looks reasonable):

image

However on fine-tuning and taking these 2 images in the training set, the output becomes worse:

image image
puyiwen commented 1 month ago

仅使用 arkitscences 数据进行训练,并根据 Co3d 数据进行评估

Arkitscenes 和 Co3d 是两个截然不同的数据集(室内房间 vs 以物体为中心)。在 Arkitscenes 上训练的模型在 Co3d 上表现不佳这一事实似乎并不令人惊讶。

在我的终端,我使用发布的代码启动了一个小型训练实验(ARKitScenes、StaticThings3D、ScanNetpp 上的 224 线性),并且 ARKitScenes 上的损失/验证进展顺利。

@yocabon Thank you for your reply! You mean the generalization of dust3r is limited, depend on the training data? I used the pre-trained model you provided and tested it on my own data set, and it performed very well. The absolute depth estimation is accurate. Does this mean that the strong generalization of the model is due to the large and diverse data?

yocabon commented 1 month ago

yes, dust3r is data driven. It is very important to train it on large diverse datasets.
About Megadepth, if the training loss goes down, then maybe it's forgetting some of the previous training, and maybe replicating some of the inaccuracies in the ground truth of Megadepth.

puyiwen commented 1 month ago

@yocabon Thank you for your reply, I have another question. I find the Crocov2 are pretrained at 3D vision task dataset, if I pretrained Crocov2 with imagenet, what will be the effect of the dust3r?

puyiwen commented 1 month ago

@VillardX Sorry to bother you, I want to know how to make image pairs with custom data? Can you help me? Thank you!