Open kbpachauri opened 3 years ago
Hi KP, your understanding is correct, we do use RGB-D images for training. When we say is is trained on monocular data, what we mean is that the network is that the network is only given monocular data as input. This contrast to inference time when depth and stereo can also be used as input.
Thanks for confirmation!
Best, KP
@zachteed I generated the depth using monodepth2 for data on which I am trying to train the DROID SLAM Training started and completed 1000 iteration and I save the checkpoint but looks like training is stuck in some error.. as the results look quite the worst on training data than the already trained model.
I also saw these messages as well while training.. looks like something is wrong with the data or training setting..?
torch.linalg.cholesky: For batch 0: U(19,19) is zero, singular U. torch.linalg.cholesky: For batch 0: U(4,4) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U. torch.linalg.cholesky: For batch 0: U(16,16) is zero, singular U.
Any advice?
Thanks.
Best, KP!
It's normal to occasionally encounter singular matrices during training. The training code detects when these occur and will not compute gradients through the system, so this shouldn't be the cause of your issue.
Could you provide more information about your training setup? Are you using depth from mondepth2 and ground truth poses? If you have ground truth poses, I would recommend removing the "flow loss" which is the only loss the requires depth and training only with the other 2 losses. I've never trained from scratch only supervise on pose, but I was able to get good results fine-tuning a pretrain model using pose supervision only (no depth required).
Thanks, @zachteed for the quick reply!
Are you using depth from mondepth2 and ground truth poses?
Yes, depth from monodepth2 and ground truth pose from code masters.
Could you provide more information about your training setup?
Below are details for my training setup.
I am trying to train the model on code masters data which gives all the pose information, which we converted to EuRoC format.
For training, I am using all default parameters
edges=24, fmax=96.0, fmin=8.0, gpus=1, iters=15, lr=0.00025, n_frames=7, name='bla', noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=1
The first error I figured out is our pose file has a extra column timestamp and is in EuRoc Format. So I changed the below lines in the tartan.py file
tx ty tz qx qy qz qw #orig poses = poses[:, [1, 2, 0, 4, 5, 3, 6]] to timestamp tx ty tz qw qx qy qz #codemasters poses = poses[:, [2, 3, 1, 6, 7, 5, 4]]
But after the above change, I am always getting the below message (which is printed by line 28 in factory.py)
Dataset tartan has 0 images.
I also make sure to delete the cache file before each run which is saved as "DROID-SLAM/droid_slam/data_readers/cache/TartanAir.pickle.
On further debugging, I found a condition in _build_dataset_index in base.py at line 60 is never triggered. Also when I checked the scene in _build_dataset in tartan.py it has all the images. so looks to be a problem with some settings.
Now I increased the max_flow in build_frame_graph from 256 to 1000, graph built and Dataset tartan has some 2600 images.
I train for 3000 iterations from scratch, and this time I didn't get any singular matrices issue but the results seem to be the same.
Looks like I am still missing some settings or information, not sure what.
I did check the tensor board and residual, rot_error, tr_error is decreasing but f_error is not decreasing.
Also, I tried to load the checkpoint for finetuning, but checkpoint loading fails. I am using just 1 GPU to train.
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
Thanks!
Best, KP!
I am able to fix the issue of pre-trained (droid.pth) model loading by changing the model loading in train.py line 57:58
if args.ckpt is not None:
model.load_state_dict(torch.load(args.ckpt))
to
if args.ckpt is not None:
state_dict = torch.load(args.ckpt)
state_dict["module.update.weight.2.weight"] = state_dict["module.update.weight.2.weight"][:2]
state_dict["module.update.weight.2.bias"] = state_dict["module.update.weight.2.bias"][:2]
state_dict["module.update.delta.2.weight"] = state_dict["module.update.delta.2.weight"][:2]
state_dict["module.update.delta.2.bias"] = state_dict["module.update.delta.2.bias"][:2]
model.load_state_dict(state_dict)
But still, didn't see much difference in output, looks something is wrong with my data or how I am feeding data to the network.
Thanks!
Best, KP
@zachteed
I disable flow_loss by setting w3 = 0. Also, in the training loop disable the build_frame_graph code which uses depth to create frame_graph.
residual error, rot error, and translation error seems to be reducing.
But the error still seems too high. and looks like some garbage output.
I use the same demo.py setting, didn't change anything.
Any further suggestion, what could be wrong?
Thanks!
Best, KP!
@zachteed I fine-tune the model on a single TartanAir dataset (abandonedfactory, easy) for 2000 iteration, and inferred results are fine with the fine-tune model, did this experiment to make sure end 2 end process is fine!
[Backend Optimization OFF].
[Backend Optimization ON].
Also did a new experiment to check if, without depth, model finetuning works on the TartanAir dataset (abandonedfactory, easy), and from the results, I can see I got a similar issue as my dataset, fine-tuning does not work.
[Backend Optimization OFF].
I am stuck at this point, need your advice to move forward, what could be wrong with code, or data, or do I need to figure out the setting for fine-tuned model without depth?.
Thanks.
Best, KP!
Depth maps from monodepth2 don't have spatial and temporal consistency. Maybe that is a problem?
@zachteed Hi! Thank you very much for sharing the code of DROID_SLAM. I've trained 80,000 times on the Tartanair dataset so far, but why hasn't loss_function shown any signs of convergence so far?
@kbpachauri hi. how you succeeded to implement monodepth2 depth images along with rgb? in my case i have got some dimension errors of the depth images. error is below;
File "/home/ubuntu/droid2/DROID-SLAM/droid_slam/geom/projective_ops.py", line 130, in induced_flow ht, wd = disps.shape[2:] ValueError: too many values to unpack (expected 2)
thanks
@kbpachauri Hello, I have benefited a lot from reading the above questions. Thank you. I also want to train a single purpose rgb dataset that can obtain corresponding depth maps and poses through airsim. Have you successfully trained? Considering that it has been a long time and it may be difficult for you to come up with some details, may I ask you some simple questions? How to place dataset files when training with official datasets? How to place it when using your own dataset?
@zachteed I fine-tune the model on a single TartanAir dataset (abandonedfactory, easy) for 2000 iteration, and inferred results are fine with the fine-tune model, did this experiment to make sure end 2 end process is fine!
[Backend Optimization OFF].
[Backend Optimization ON].
Also did a new experiment to check if, without depth, model finetuning works on the TartanAir dataset (abandonedfactory, easy), and from the results, I can see I got a similar issue as my dataset, fine-tuning does not work.
[Backend Optimization OFF].
I am stuck at this point, need your advice to move forward, what could be wrong with code, or data, or do I need to figure out the setting for fine-tuned model without depth?.
Thanks.
Best, KP!
can
@zachteed I fine-tune the model on a single TartanAir dataset (abandonedfactory, easy) for 2000 iteration, and inferred results are fine with the fine-tune model, did this experiment to make sure end 2 end process is fine!
[Backend Optimization OFF].
[Backend Optimization ON].
Also did a new experiment to check if, without depth, model finetuning works on the TartanAir dataset (abandonedfactory, easy), and from the results, I can see I got a similar issue as my dataset, fine-tuning does not work.
[Backend Optimization OFF].
I am stuck at this point, need your advice to move forward, what could be wrong with code, or data, or do I need to figure out the setting for fine-tuned model without depth?.
Thanks.
Best, KP!
hello, I have the same question as you, Can I communicate with you privately? I have some questions to ask you. Thank you very much. Here is my email address: 1063062177@qq.com, and my WeChat ID is: 18186416709. Please feel free to contact me using whichever method is convenient for you. I would like to have a conversation with you. Thank you very much. @kbpachauri
Thanks, @zachteed for the quick reply!
Are you using depth from mondepth2 and ground truth poses?
Yes, depth from monodepth2 and ground truth pose from code masters.
Could you provide more information about your training setup?
Below are details for my training setup.
I am trying to train the model on code masters data which gives all the pose information, which we converted to EuRoC format.
For training, I am using all default parameters
edges=24, fmax=96.0, fmin=8.0, gpus=1, iters=15, lr=0.00025, n_frames=7, name='bla', noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=1
The first error I figured out is our pose file has a extra column timestamp and is in EuRoc Format. So I changed the below lines in the tartan.py file
tx ty tz qx qy qz qw #orig poses = poses[:, [1, 2, 0, 4, 5, 3, 6]] to timestamp tx ty tz qw qx qy qz #codemasters poses = poses[:, [2, 3, 1, 6, 7, 5, 4]]
But after the above change, I am always getting the below message (which is printed by line 28 in factory.py)
Dataset tartan has 0 images.
I also make sure to delete the cache file before each run which is saved as "DROID-SLAM/droid_slam/data_readers/cache/TartanAir.pickle.
On further debugging, I found a condition in _build_dataset_index in base.py at line 60 is never triggered. Also when I checked the scene in _build_dataset in tartan.py it has all the images. so looks to be a problem with some settings.
Now I increased the max_flow in build_frame_graph from 256 to 1000, graph built and Dataset tartan has some 2600 images.
I train for 3000 iterations from scratch, and this time I didn't get any singular matrices issue but the results seem to be the same.
Looks like I am still missing some settings or information, not sure what.
I did check the tensor board and residual, rot_error, tr_error is decreasing but f_error is not decreasing.
Also, I tried to load the checkpoint for finetuning, but checkpoint loading fails. I am using just 1 GPU to train.
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]). size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
- What are bad_rot and bad_tr?
- You mentioned training with only 2 other losses, but how the graph will be built if we don't have depth, you mean just use disparity to build frame graph but disable flow loss?
- How DEPTH_SCALE is decided? currently, for tartan, it's set to 5. monodepth2 scale disparity to (0.1 to 100).
- How to fix the pre-trained checkpoint loading?
- Do I need to change something for poses ground truth or some information that I am missing?
Thanks!
Best, KP!
@kbpachauri hello, I have the same question as you, Can I communicate with you privately? I have some questions to ask you. Thank you very much. Here is my email address: 1063062177@qq.com, and my WeChat ID is: 18186416709. Please feel free to contact me using whichever method is convenient for you. I would like to have a conversation with you. Thank you very much. @kbpachauri
I am able to fix the issue of pre-trained (droid.pth) model loading by changing the model loading in train.py line 57:58
if args.ckpt is not None: model.load_state_dict(torch.load(args.ckpt)) to if args.ckpt is not None: state_dict = torch.load(args.ckpt) state_dict["module.update.weight.2.weight"] = state_dict["module.update.weight.2.weight"][:2] state_dict["module.update.weight.2.bias"] = state_dict["module.update.weight.2.bias"][:2] state_dict["module.update.delta.2.weight"] = state_dict["module.update.delta.2.weight"][:2] state_dict["module.update.delta.2.bias"] = state_dict["module.update.delta.2.bias"][:2] model.load_state_dict(state_dict)
But still, didn't see much difference in output, looks something is wrong with my data or how I am feeding data to the network.
Thanks!
Best, KP
Hello, I have been stuck here for a long time now, may I ask if you have solved this problem? If it is convenient, can you add a contact information for detailed communication? I would like to ask you about my wechat 18186416709, email 1063062177@qq.com
@kbpachauri Hello, I have benefited a lot from reading the above questions. Thank you. I also want to train a single purpose rgb dataset that can obtain corresponding depth maps and poses through airsim. Have you successfully trained? Considering that it has been a long time and it may be difficult for you to come up with some details, may I ask you some simple questions? How to place dataset files when training with official datasets? How to place it when using your own dataset? 3. How is the file placed by fine-tuning the model (anandonefactory, easy)? What is the command? I really need your help, thank you very much. Looking forward to your reply.
Hello, have you solved this problem and how did you arrange the data set structure?
Hi,
In paper its mentioned, DROID SLAM trained on monocular data generalizes to stereo and as well as to RGBD. But when I checked the training code specially with tartan data, looks we need depth as well.
Can you please confirm is this understanding correct or am i missing something?
Thanks!
Best, KP!