Open VillardX opened 1 month ago
Hi,
Thanks for your reply.
I think the preprocess of my own data is the same as you say in 1. and 2. However, when I train it from scratch or finetune based on "DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth" you provide, the result in gradio gets even worse. As you say in 3. , could I assess that the low quality of finetuning is due to the bugs of publicly released code. But the gradio demo.py works correctly, which means there is no bugs in the inference step?
I train/finetune on a single 4090 card, the num of images pairs in my trainset is about 130000.
For training from scrath, I use batchsize=8, the Scale_ShiftInv_pts3d_med goes really strange.
If this is not due to the train bugs you mentioned in the reply. Hope for some suggestions.
@VillardX Hi, when I train only use arkitscences data and evaluate on Co3d data,the Regr3D_ScaleShiftInv_pts3d_2_med
is strange. Have you solved the problem?
train only use arkitscences data and evaluate on Co3d data
Arkitscenes and Co3d are two very different datasets (indoor rooms vs object centric). The fact that a model trained on Arkitscenes doesn't perform well on Co3d doesn't seem too surprising.
On my end, I launched a small training experiments (224 linear on ARKitScenes, StaticThings3D, ScanNetpp) using the released code and the loss/validation on ARKitScenes are going down nicely.
@yocabon i also tried the training script by using the entire Megadepth dataset with a larger batch size of 128 and for more epochs.
While the loss goes down, the results seem to get progressively worse qualitatively, even for training set data.
For those 2 images, the default pre-trained dust3r output is (which looks reasonable):
However on fine-tuning and taking these 2 images in the training set, the output becomes worse:
仅使用 arkitscences 数据进行训练,并根据 Co3d 数据进行评估
Arkitscenes 和 Co3d 是两个截然不同的数据集(室内房间 vs 以物体为中心)。在 Arkitscenes 上训练的模型在 Co3d 上表现不佳这一事实似乎并不令人惊讶。
在我的终端,我使用发布的代码启动了一个小型训练实验(ARKitScenes、StaticThings3D、ScanNetpp 上的 224 线性),并且 ARKitScenes 上的损失/验证进展顺利。
@yocabon Thank you for your reply! You mean the generalization of dust3r is limited, depend on the training data? I used the pre-trained model you provided and tested it on my own data set, and it performed very well. The absolute depth estimation is accurate. Does this mean that the strong generalization of the model is due to the large and diverse data?
yes, dust3r is data driven. It is very important to train it on large diverse datasets.
About Megadepth, if the training loss goes down, then maybe it's forgetting some of the previous training, and maybe replicating some of the inaccuracies in the ground truth of Megadepth.
@yocabon Thank you for your reply, I have another question. I find the Crocov2 are pretrained at 3D vision task dataset, if I pretrained Crocov2 with imagenet, what will be the effect of the dust3r?
@VillardX Sorry to bother you, I want to know how to make image pairs with custom data? Can you help me? Thank you!
Thank you for your great work!
In my situations, I want to train dust3r from scratch or finetune it on my own data. You provide the Co3d preprocessed data demo. However I still have some questions.
Hope for help, thanks.