princeton-vl / DROID-SLAM

BSD 3-Clause "New" or "Revised" License

1.66k stars 273 forks source link

Training with only Monocular Data #10

Open kbpachauri opened 2 years ago

kbpachauri commented 2 years ago

Hi,

In paper its mentioned, DROID SLAM trained on monocular data generalizes to stereo and as well as to RGBD. But when I checked the training code specially with tartan data, looks we need depth as well.

Can you please confirm is this understanding correct or am i missing something?

Thanks!

Best, KP!

zachteed commented 2 years ago

Hi KP, your understanding is correct, we do use RGB-D images for training. When we say is is trained on monocular data, what we mean is that the network is that the network is only given monocular data as input. This contrast to inference time when depth and stereo can also be used as input.

kbpachauri commented 2 years ago

Thanks for confirmation!

Do you see a way, where DROID-SLAM can be retrain only w-th RGB data ?
- Something similar to D3VO where they add addition network for depth estimation?

Best, KP

kbpachauri commented 2 years ago

@zachteed I generated the depth using monodepth2 for data on which I am trying to train the DROID SLAM Training started and completed 1000 iteration and I save the checkpoint but looks like training is stuck in some error.. as the results look quite the worst on training data than the already trained model.

I also saw these messages as well while training.. looks like something is wrong with the data or training setting..?

torch.linalg.cholesky: For batch 0: U(19,19) is zero, singular U. torch.linalg.cholesky: For batch 0: U(4,4) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U.

Any advice?

Thanks.

Best, KP!

zachteed commented 2 years ago

It's normal to occasionally encounter singular matrices during training. The training code detects when these occur and will not compute gradients through the system, so this shouldn't be the cause of your issue.

Could you provide more information about your training setup? Are you using depth from mondepth2 and ground truth poses? If you have ground truth poses, I would recommend removing the "flow loss" which is the only loss the requires depth and training only with the other 2 losses. I've never trained from scratch only supervise on pose, but I was able to get good results fine-tuning a pretrain model using pose supervision only (no depth required).

kbpachauri commented 2 years ago

Thanks, @zachteed for the quick reply!

Are you using depth from mondepth2 and ground truth poses?

Yes, depth from monodepth2 and ground truth pose from code masters.

Could you provide more information about your training setup?

Below are details for my training setup.

I am trying to train the model on code masters data which gives all the pose information, which we converted to EuRoC format.

For training, I am using all default parameters

edges=24, fmax=96.0, fmin=8.0, gpus=1, iters=15, lr=0.00025, n_frames=7, name='bla', noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=1

The first error I figured out is our pose file has a extra column timestamp and is in EuRoc Format. So I changed the below lines in the tartan.py file

tx ty tz qx qy qz qw #orig poses = poses[:, [1, 2, 0, 4, 5, 3, 6]] to timestamp tx ty tz qw qx qy qz #codemasters poses = poses[:, [2, 3, 1, 6, 7, 5, 4]]

But after the above change, I am always getting the below message (which is printed by line 28 in factory.py)

Dataset tartan has 0 images.

I also make sure to delete the cache file before each run which is saved as "DROID-SLAM/droid_slam/data_readers/cache/TartanAir.pickle.

On further debugging, I found a condition in _build_dataset_index in base.py at line 60 is never triggered. Also when I checked the scene in _build_dataset in tartan.py it has all the images. so looks to be a problem with some settings.

Now I increased the max_flow in build_frame_graph from 256 to 1000, graph built and Dataset tartan has some 2600 images.

I train for 3000 iterations from scratch, and this time I didn't get any singular matrices issue but the results seem to be the same.

Looks like I am still missing some settings or information, not sure what.

I did check the tensor board and residual, rot_error, tr_error is decreasing but f_error is not decreasing.

Screenshot 2021-10-15 at 2 50 13 PM

Screenshot 2021-10-15 at 2 50 31 PM

Also, I tried to load the checkpoint for finetuning, but checkpoint loading fails. I am using just 1 GPU to train.

RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).

What are bad_rot and bad_tr?
You mentioned training with only 2 other losses, but how the graph will be built if we don't have depth, you mean just use disparity to build frame graph but disable flow loss?
How DEPTH_SCALE is decided? currently, for tartan, it's set to 5. monodepth2 scale disparity to (0.1 to 100).
How to fix the pre-trained checkpoint loading?
Do I need to change something for poses ground truth or some information that I am missing?

Thanks!

Best, KP!

kbpachauri commented 2 years ago

I am able to fix the issue of pre-trained (droid.pth) model loading by changing the model loading in train.py line 57:58

if args.ckpt is not None: model.load_state_dict(torch.load(args.ckpt)) to if args.ckpt is not None: state_dict = torch.load(args.ckpt) state_dict["module.update.weight.2.weight"] = state_dict["module.update.weight.2.weight"][:2] state_dict["module.update.weight.2.bias"] = state_dict["module.update.weight.2.bias"][:2] state_dict["module.update.delta.2.weight"] = state_dict["module.update.delta.2.weight"][:2] state_dict["module.update.delta.2.bias"] = state_dict["module.update.delta.2.bias"][:2]
model.load_state_dict(state_dict)

But still, didn't see much difference in output, looks something is wrong with my data or how I am feeding data to the network.

Thanks!

Best, KP

kbpachauri commented 2 years ago

@zachteed

I disable flow_loss by setting w3 = 0. Also, in the training loop disable the build_frame_graph code which uses depth to create frame_graph.

if np.random.rand() < 0.5:

graph = build_frame_graph(poses, disps, intrinsics, num=args.edges)

residual error, rot error, and translation error seems to be reducing.

Screenshot 2021-10-16 at 10 41 30 PM

But the error still seems too high. and looks like some garbage output.

Screenshot 2021-10-16 at 10 46 07 PM

I use the same demo.py setting, didn't change anything.

Any further suggestion, what could be wrong?

Thanks!

Best, KP!

kbpachauri commented 2 years ago

@zachteed I fine-tune the model on a single TartanAir dataset (abandonedfactory, easy) for 2000 iteration, and inferred results are fine with the fine-tune model, did this experiment to make sure end 2 end process is fine!

Screenshot 2021-10-18 at 10 43 00 PM [Backend Optimization OFF].

Screenshot 2021-10-18 at 10 42 50 PM [Backend Optimization ON].

Also did a new experiment to check if, without depth, model finetuning works on the TartanAir dataset (abandonedfactory, easy), and from the results, I can see I got a similar issue as my dataset, fine-tuning does not work.

Screenshot 2021-10-19 at 11 00 52 AM [Backend Optimization OFF].

I am stuck at this point, need your advice to move forward, what could be wrong with code, or data, or do I need to figure out the setting for fine-tuned model without depth?.

Thanks.

Best, KP!

louzq16 commented 2 years ago

Depth maps from monodepth2 don't have spatial and temporal consistency. Maybe that is a problem?

jiesico commented 2 years ago

@zachteed Hi! Thank you very much for sharing the code of DROID_SLAM. I've trained 80,000 times on the Tartanair dataset so far, but why hasn't loss_function shown any signs of convergence so far?

arkinrc commented 1 year ago

@kbpachauri hi. how you succeeded to implement monodepth2 depth images along with rgb? in my case i have got some dimension errors of the depth images. error is below;

File "/home/ubuntu/droid2/DROID-SLAM/droid_slam/geom/projective_ops.py", line 130, in induced_flow ht, wd = disps.shape[2:] ValueError: too many values to unpack (expected 2)

thanks

zhangjd1029 commented 1 year ago

@kbpachauri Hello, I have benefited a lot from reading the above questions. Thank you. I also want to train a single purpose rgb dataset that can obtain corresponding depth maps and poses through airsim. Have you successfully trained? Considering that it has been a long time and it may be difficult for you to come up with some details, may I ask you some simple questions? How to place dataset files when training with official datasets? How to place it when using your own dataset?

How is the file placed by fine-tuning the model (anandonefactory, easy)? What is the command? I really need your help, thank you very much. Looking forward to your reply.

Junda24 commented 11 months ago

@zachteed I fine-tune the model on a single TartanAir dataset (abandonedfactory, easy) for 2000 iteration, and inferred results are fine with the fine-tune model, did this experiment to make sure end 2 end process is fine!

[Backend Optimization OFF].

[Backend Optimization ON].

Also did a new experiment to check if, without depth, model finetuning works on the TartanAir dataset (abandonedfactory, easy), and from the results, I can see I got a similar issue as my dataset, fine-tuning does not work.

[Backend Optimization OFF].

I am stuck at this point, need your advice to move forward, what could be wrong with code, or data, or do I need to figure out the setting for fine-tuned model without depth?.

Thanks.

Best, KP!

can

@zachteed I fine-tune the model on a single TartanAir dataset (abandonedfactory, easy) for 2000 iteration, and inferred results are fine with the fine-tune model, did this experiment to make sure end 2 end process is fine!

[Backend Optimization OFF].

[Backend Optimization ON].

Also did a new experiment to check if, without depth, model finetuning works on the TartanAir dataset (abandonedfactory, easy), and from the results, I can see I got a similar issue as my dataset, fine-tuning does not work.

[Backend Optimization OFF].

I am stuck at this point, need your advice to move forward, what could be wrong with code, or data, or do I need to figure out the setting for fine-tuned model without depth?.

Thanks.

Best, KP!

hello, I have the same question as you, Can I communicate with you privately? I have some questions to ask you. Thank you very much. Here is my email address: 1063062177@qq.com, and my WeChat ID is: 18186416709. Please feel free to contact me using whichever method is convenient for you. I would like to have a conversation with you. Thank you very much. @kbpachauri

Junda24 commented 11 months ago

Thanks, @zachteed for the quick reply!

Are you using depth from mondepth2 and ground truth poses?

Yes, depth from monodepth2 and ground truth pose from code masters.

Could you provide more information about your training setup?

Below are details for my training setup.

I am trying to train the model on code masters data which gives all the pose information, which we converted to EuRoC format.

For training, I am using all default parameters

edges=24, fmax=96.0, fmin=8.0, gpus=1, iters=15, lr=0.00025, n_frames=7, name='bla', noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=1

The first error I figured out is our pose file has a extra column timestamp and is in EuRoc Format. So I changed the below lines in the tartan.py file

tx ty tz qx qy qz qw #orig poses = poses[:, [1, 2, 0, 4, 5, 3, 6]] to timestamp tx ty tz qw qx qy qz #codemasters poses = poses[:, [2, 3, 1, 6, 7, 5, 4]]

But after the above change, I am always getting the below message (which is printed by line 28 in factory.py)

Dataset tartan has 0 images.

I also make sure to delete the cache file before each run which is saved as "DROID-SLAM/droid_slam/data_readers/cache/TartanAir.pickle.

On further debugging, I found a condition in _build_dataset_index in base.py at line 60 is never triggered. Also when I checked the scene in _build_dataset in tartan.py it has all the images. so looks to be a problem with some settings.

Now I increased the max_flow in build_frame_graph from 256 to 1000, graph built and Dataset tartan has some 2600 images.

I train for 3000 iterations from scratch, and this time I didn't get any singular matrices issue but the results seem to be the same.

Looks like I am still missing some settings or information, not sure what.

I did check the tensor board and residual, rot_error, tr_error is decreasing but f_error is not decreasing.

Also, I tried to load the checkpoint for finetuning, but checkpoint loading fails. I am using just 1 GPU to train.

RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).

What are bad_rot and bad_tr?

You mentioned training with only 2 other losses, but how the graph will be built if we don't have depth, you mean just use disparity to build frame graph but disable flow loss?

How DEPTH_SCALE is decided? currently, for tartan, it's set to 5. monodepth2 scale disparity to (0.1 to 100).

How to fix the pre-trained checkpoint loading?

Do I need to change something for poses ground truth or some information that I am missing?

Thanks!

Best, KP!

@kbpachauri hello, I have the same question as you, Can I communicate with you privately? I have some questions to ask you. Thank you very much. Here is my email address: 1063062177@qq.com, and my WeChat ID is: 18186416709. Please feel free to contact me using whichever method is convenient for you. I would like to have a conversation with you. Thank you very much. @kbpachauri

Junda24 commented 11 months ago

I am able to fix the issue of pre-trained (droid.pth) model loading by changing the model loading in train.py line 57:58

if args.ckpt is not None: model.load_state_dict(torch.load(args.ckpt)) to if args.ckpt is not None: state_dict = torch.load(args.ckpt) state_dict["module.update.weight.2.weight"] = state_dict["module.update.weight.2.weight"][:2] state_dict["module.update.weight.2.bias"] = state_dict["module.update.weight.2.bias"][:2] state_dict["module.update.delta.2.weight"] = state_dict["module.update.delta.2.weight"][:2] state_dict["module.update.delta.2.bias"] = state_dict["module.update.delta.2.bias"][:2] model.load_state_dict(state_dict)

But still, didn't see much difference in output, looks something is wrong with my data or how I am feeding data to the network.

Thanks!

Best, KP

Hello, I have been stuck here for a long time now, may I ask if you have solved this problem? If it is convenient, can you add a contact information for detailed communication? I would like to ask you about my wechat 18186416709, email 1063062177@qq.com

Tianci-Wen commented 3 months ago

@kbpachauri Hello, I have benefited a lot from reading the above questions. Thank you. I also want to train a single purpose rgb dataset that can obtain corresponding depth maps and poses through airsim. Have you successfully trained? Considering that it has been a long time and it may be difficult for you to come up with some details, may I ask you some simple questions? How to place dataset files when training with official datasets? How to place it when using your own dataset? 3. How is the file placed by fine-tuning the model (anandonefactory, easy)? What is the command? I really need your help, thank you very much. Looking forward to your reply.

Hello, have you solved this problem and how did you arrange the data set structure?