noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
527 stars 58 forks source link

training with Nuscenes dataset #137

Closed wasup07 closed 3 months ago

wasup07 commented 4 months ago

Hello, I read your article and appreciate your work on lite-mono, I would like to train your model (using the pre-trained backbone) on the Nuscenes dataset. I have a slight problem: I don't understand from the article how the dataset should be split and what structure it should follow. Is it enough to have 3 folders: train, val and test, and is it not necessary to divide the images according to the angle at which they were taken? I plan to train the model as a self-supervised model.

wasup07 commented 4 months ago

this is my dataset structure : image

noahzn commented 4 months ago

Hi, you can take a look at the KITTI data loader (https://github.com/noahzn/Lite-Mono/blob/main/datasets/kitti_dataset.py) and you need to write your own Nucenes data loader. You can choose a camera, and this argument decides which frames are loaded as the previous, current, next frame. If the sequential frames in your dataset are very close (i.e., the car moved very slowly), then you can try something like [0, -5, 5], and it means you choose the -5th frame, the current frame, and the 5th frame for training. Two or three splits are enough.

wasup07 commented 4 months ago

Thank you for your answer. Does this mean that I can only choose one side camera angle and that I can not train on six of them simultaneously?

wasup07 commented 4 months ago

And I have to divide my data set into train, val and test?

noahzn commented 4 months ago

Yes, you can only choose one camera, as this is monocular self-supervised training. A train and a test set should be enough. If you have more data, you can split an additional val set.

wasup07 commented 4 months ago

Thank you very much for your time. Could I ask you to put this Issue on hold for 3 weeks or one month, as I may come back to you with further questions?

noahzn commented 4 months ago

Ok, no problem.

wasup07 commented 4 months ago

Hello Noah,

I'd like to ask you a question, After re-reading your scripts (options.py, kitti_dataset.py, trainer.py) and checking the kitti raw dataset, I discovered that you trained your model on grayscale images as opposed to the nuscenes dataset (which I plan to use). So I'd like to check which files and folders are used: image for the moment i know that the velodyne files are used but for the rest i'm not sure.

noahzn commented 4 months ago

Hi @wasup07 , I used RGB images for training (See the code here).

If you prepare the KITTI dataset according to README, you will get training and val splits, and they are something like this file.

wasup07 commented 4 months ago

I really appreciate your quick response.

I've already seen the steps for splitting the data, but I'd like to know what file other than the images should be used so that I can determine, by comparing with the nuscenes dataset, what I should delete or add in the trainer.py file. Also, is the purpose of the Monodataset class just to convert images and augmentate them?

noahzn commented 4 months ago

When you get the splits, the .txt files will show you all the images used in the training and val. Therefore, you can just check the files manually according to their paths. The mono_dataset.py class is a base class used in the project, and any specific dataset class should be inherited from this base class so that it makes your life easier. For example, for your Nuscenes dataset, you can simply create a new file nuscenes_dataset.py, and define the class as class NUSCENESDataset(MonoDataset). Then, you can write any specified data loading code in this class, and you don't need to change the base class.

wasup07 commented 4 months ago

Thanks for your quick reply,

I wrote my own Nuscenes_dataset file but before that, I checked the functionality of the model if I train on the kitti dataset: I used eigen_zhou to separate the val and train datasets and I didn't change anything else but I still have the following error that I couldn't solve with the debugger mode on vscode.

image

I think there's an error in this section, but I'm not sure: image

insetad of features = self.models["encoder"](inputs["color_aug", 0, 0]) it should be features = self.models["encoder"](inputs[("color_aug", 0, 0)])

noahzn commented 4 months ago

I think the error comes from the incompatibility of your PyTorch and CUDA version.

wasup07 commented 4 months ago

Yes, it seems that when I tried to update it, it didn't work, but when I searched for a specific version, it did. I have a couple of questions:

wasup07 commented 4 months ago

Another question: I think that as I'm not using a stereo camera, I can deactivate this option?

noahzn commented 4 months ago
  1. It is possible. Please make sure that each image pair for training is consistent. You cannot mix up left_back and right_back.
  2. Yes.
  3. Yes, you can resize the input. For lite-mono we also support different image size for training. You can set that in the options.py
  4. Lite-Mono is only for monocular training. Just don't set --use_stereo then it's ok.
wasup07 commented 4 months ago

Thank you for your quick reply.

There are a few points I don't understand.

  1. Please make sure that each pair of images for training is consistent: could you explain this further, because to me it doesn't seem like different sides of the same image are paired in training (left and right) - that's what I noticed when I checked the eigne_zhou folder.
  2. Yes, but to make sure, do I need to turn the 33 matrix into a 44 matrix or can I leave it as a 3*3 matrix?
noahzn commented 4 months ago
  1. Yes, so use images from the left camera or the right camera.
  2. It's 4x4 matrix, see the code here.
wasup07 commented 4 months ago
  1. Yes, so use images from the left camera or the right camera: But that's not what I noticed in the eigen_zhou folder for train_files.txt: there are images on the left and right and they don't match chronologically "[0,-1,1]".
wasup07 commented 4 months ago

Here is an example of file names in train_files.txt image

noahzn commented 4 months ago

Yes, the first image pairs are 473r, 472r, 474r (0, -1, 1). Please make sure that you define the frames correctly.

wasup07 commented 4 months ago

Yes, the first image pairs are 473r, 472r, 474r (0, -1, 1). Please make sure that you define the frames correctly: Sorry, perhaps I wasn't paying close enough attention, but I can't see how the images are matched (I can't see images 473r, 472r, 474r train_files.txt for Kitti_dataset).

noahzn commented 4 months ago

Please check the __getitem__ function here. Specifically, this line.

wasup07 commented 4 months ago

Hello, thank you again for your quick reply!

I've managed to start training on the nuscenes dataset and I no longer have any problems with this dataset, but I do have a small problem and I'd like to have a suggestion from you

I haven't been able to use tensorboard because of the incompatibility of the pytorch version, which I can't change otherwise I won't be able to train lit-mono.

So do you have an idea for to follow the training without using tensorborad?

noahzn commented 4 months ago

Hi, you can install tensorboardX.

wasup07 commented 4 months ago

I've tried but you need to update the pytorch version: image

noahzn commented 4 months ago

You can install using pip install tensorboardX

wasup07 commented 4 months ago

I've tried it and it seems to be a common tensorboard problem (I've had the same feedback from a colleague who uses tensorboard on a similar projects).

noahzn commented 4 months ago

Or you can try wandb.

wasup07 commented 4 months ago

In this case, I think it's important to modify the trainer.py file?

noahzn commented 4 months ago

Yes, you need to add some lines of code.

wasup07 commented 4 months ago

Hello, thank you for all your help over the last couple of weeks, I wanted to ask you how I can generate the depth estimation images at the end and how to evaluate them.

noahzn commented 4 months ago

https://github.com/noahzn/Lite-Mono/blob/main/evaluate_depth.py This file generates depth predictions and compares with the ground-truth.

wasup07 commented 4 months ago

Thank you again for your quick reply!

I would like to ask you where I can retrieve the following error metrics:

image

I I didn't find them in trainer.py file if I'm not mistaken.

noahzn commented 4 months ago

https://github.com/noahzn/Lite-Mono/blob/main/evaluate_depth.py#L42

wasup07 commented 4 months ago

Hi Noah,

I'd like to get your feedback on training Lite Mono on the Nuscenes dataset, I trained the model several times on a subset of the Nuscenes dataset just to check the model's effectiveness. The subset was about 1250 images of the central rear camera (no stereo camera). 1000 images for the training dataset and 250 images for the validation dataset. The problem here is that the training loss function always alternates between 0.06 and 0.08, so it's not stable and doesn't decrease, unlike what I noticed with the Kitti dataset. I tried fine-tuning the parameters in the --lr option but nothing changed. Attached are the mono_dataset file, the nuscenes dataset file and the options file. Could you give me your opinion on what could be the problem or what I should change in these files? I have not modified the trainer file I've added the Jason files for camera information. calibrated_sensor.json

sensor.json options_or.txt mono_dataset_nuscenes.txt nuscenes_dataset.txt

wasup07 commented 4 months ago

Thanks in advance

noahzn commented 3 months ago

Hi, using 1000 images for training is not enough for this self-supervised method. Did you check the visualizations in the tensorboard?

noahzn commented 3 months ago

I am now closing this issue as there is no further update.