ifnspaml / SGDepth

[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
MIT License
200 stars 26 forks source link

Training the depth task on A2D2 #16

Open Ale0311 opened 3 years ago

Ale0311 commented 3 years ago

Hello,

I am trying to train the depth task on A2D2 as well. I created all the data necessary, the .json files and I created the 2 train and validation dataloaders. However I get this error:

Traceback (most recent call last):
  File "train.py", line 378, in <module>
    trainer = Trainer(opt)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/harness.py", line 35, in __init__
    self._init_train_loaders(opt)
  File "train.py", line 72, in _init_train_loaders
    for loader_name in opt.depth_training_loaders.split(',') if (loader_name != '')
  File "train.py", line 72, in <genexpr>
    for loader_name in opt.depth_training_loaders.split(',') if (loader_name != '')
  File "/home/diogene/Documents/Alexandra/SGDepth-master/loaders/depth/train.py", line 48, in a2d2_train
    **cfg_common
  File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/specialdatasets.py", line 15, in __init__
    super(StandardDataset, self).__init__(*args, **kwargs)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/basedataset.py", line 116, in __init__
    video_frames, folders_to_load, files_to_load, n_files)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/basedataset.py", line 397, in read_json_file
    indices = np.array(data_positions[name])[:, 3]
IndexError: too many indices for array

I believe it has to do something with the frames for the depth, because I have set the _'videomode' to video in the depth dataloader for training. However, if I set the _'videomode' to mono, I get this error:

Traceback (most recent call last):
  File "train.py", line 379, in <module>
    trainer.train()
  File "train.py", line 347, in train
    self._run_epoch()
  File "train.py", line 265, in _run_epoch
    loss_depth += self._process_batch_depth(dataset, output, output_masked, batch_idx, domain_name)
  File "train.py", line 198, in _process_batch_depth
    losses_depth = self.depth_losses.compute_losses(dataset, output, output_masked)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/losses/depth.py", line 143, in compute_losses
    losses = self._reprojection_losses(inputs, outputs, outputs_masked)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/losses/depth.py", line 62, in _reprojection_losses
    identity_reprojection_loss = torch.cat(identity_reprojection_loss, 1)
RuntimeError: expected a non-empty list of Tensors

It is because the frame_ids is an empty tuple. What can I do to fix this?

Thank you very much!

Ale0311 commented 3 years ago

I think it is because in the basic_files.json, my positions look like this: [0, 0, 0, 0], [1, 0, 0, 1], [2, 0, 0, 2] ...

Why are the middle values 0 and 0? And how can I modify that? Do I have to make a script to modify the basic_files.json? Or can it be done from your scripts?

Thank you!

Ale0311 commented 3 years ago

Ok, here is an update. I have managed to change the positions from the basic_files.json with this script:

import os
import json
dataset_path = "/home/diogene/Documents/Alexandra/SGDepth-master/Dataset/a2d2"
DP = "/home/diogene/Documents/Alexandra/SGDepth-master/Dataset/a2d2/{}/label/cam_front_center"
number_files = [942, 952, 1089, 993, 1013, 975, 870, 1804, 2252, 571, 950, 188, 1353, 1421, 2212, 2868, 2307, 741, 2823, 969, 1344]

# for f in sorted(os.listdir(dataset_path)):
#     if os.path.isdir(os.path.join(dataset_path,f)) and not 'segmentation' in f:
#         files = os.listdir(DP.format(f))
#         number_files.append(len(files))
#
# print(number_files)

with open(os.path.join(dataset_path,'basic_files.json')) as file:
    bf = json.load(file)

positions = bf['positions']

print(bf['positions'])
new = []

for pos in positions:
    one = []
    sequence = 0;
    counter = 0
    for i,p in enumerate (pos):

        if counter == number_files[sequence]:
            sequence +=1
            counter = 0
        if sequence == 21:
            break

        p = [i,counter,number_files[sequence] - counter -1,i ]
        one.append(p)
        counter +=1
    new.append(one)

print(new)
print(len(positions[0]))

bf['positions'] = new

with open('/Dataset/a2d2/basic_files_pos.json', 'w') as fp:
    json.dump(bf, fp)

However, this is the error I get now, and I am really confused.. what does it mean and why do I get it? Before making this change to the .json file, the _warpimages function worked perfectly well. I know that because the previous error was 2 lines below this one, at the reprojection loss calculation..

Traceback (most recent call last):
  File "train.py", line 379, in <module>
    trainer.train()
  File "train.py", line 347, in train
    self._run_epoch()
  File "train.py", line 265, in _run_epoch
    loss_depth += self._process_batch_depth(dataset, output, output_masked, batch_idx, domain_name)
  File "train.py", line 189, in _process_batch_depth
    predictions_depth = self.resample.warp_images(dataset, output, output_masked)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/perspective_resample.py", line 183, in warp_images
    for pointcloud in pointclouds_target
  File "/home/diogene/Documents/Alexandra/SGDepth-master/perspective_resample.py", line 183, in <genexpr>
    for pointcloud in pointclouds_target
  File "/home/diogene/Documents/Alexandra/SGDepth-master/perspective_resample.py", line 87, in _to_sample_grid
    grid = (cam @ pointcloud.unsqueeze(-1)).squeeze(-1)
RuntimeError: invalid argument 6: wrong matrix size at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/generic/THCTensorMathBlas.cu:494

Thanks again!

klingner commented 3 years ago

Hello,

I think the first error you got is identified correctly by you. The numbers in the basic_files.json need to be modified such that the number of previous available frames and the number succeeding available frames. Up to now I never used this dataset for depth training but just for segmentation training. However, I would feel that your modifications (or at least the idea) to the basic_files.json look alright.

Did you compare the dataloader output (input to the network and loss computation) with the one supplied by the KITTI loader? Do they match? This would be my strategy to verify that at least the data loading works alright.

Ale0311 commented 3 years ago

Hello! Thank you for your response. So, I managed to start the training process. Turns out that the last error was related to the K matrix's dimension. Mine was 3x3 and the one provided for the KITTI dataset was 4x4. I solved this by adding [0 , 0, 0, 1] both as a column and as a row.

I cannot get the validation dataloader to work though. For the training I just commented out that line. But I need it for the evaluation script. This is the error I get:

Traceback (most recent call last):
  File "eval_depth.py", line 98, in <module>
    evaluator.evaluate()
  File "eval_depth.py", line 26, in evaluate
    scores, ratios, images = self._run_depth_validation(self.val_num_log_images)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/harness.py", line 317, in _run_depth_validation
    for batch in self.depth_validation_loader:
  File "/home/diogene/Documents/Alexandra/SGDepth-master/loaders/__init__.py", line 39, in __iter__
    for batch in loader:
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/PIL/Image.py", line 2649, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 3), '<f8')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/basedataset.py", line 183, in __getitem__
    sample = self.data_transforms(sample)
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/mytransforms.py", line 271, in __call__
    sample[key] = pil.fromarray(sample[key])
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/PIL/Image.py", line 2651, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

And this is the dataloader:

def a2d2_validation(img_height, img_width, batch_size, num_workers):
    """A loader that loads images and depth ground truth for
    depth validation from the a2d2 validation set.
    """

    transforms = [
        tf.CreateScaledImage(True),
        tf.Resize(
            (img_height, img_width),
            image_types=('color', )
        ),
        tf.ConvertDepth(),
        tf.CreateColoraug(),
        tf.ToTensor(),
        tf.NormalizeZeroMean(),
        tf.AddKeyValue('domain', 'a2d2_val_depth'),
        tf.AddKeyValue('validation_mask', 'validation_mask_a2d2'),
        tf.AddKeyValue('validation_clamp', 'validation_clamp_a2d2'),
        tf.AddKeyValue('purposes', ('depth', )),
    ]

    dataset = StandardDataset(
        dataset='a2d2',
        split='andreas_split',
        trainvaltest_split='validation',
        video_mode='mono',
        stereo_mode='mono',
        keys_to_load=('color', 'depth'),
        data_transforms=transforms,
        video_frames=(0, ),
        disable_const_items=True
    )

    loader = DataLoader(
        dataset, batch_size, False,
        num_workers=num_workers, pin_memory=True, drop_last=False
    )

    print(f"  - Can use {len(dataset)} images from the a2d2 (andreas_split) validation set for depth validation",
          flush=True)

    return loader

Could the error be related to the way the gt files are stored? For the a2d2 dataset I created the depth maps from the lidar points and I stored them as .png files with this function:

cv2.imwrite(path, undist_image_front_center), while undist_image_front_center is of type np.uint8.

Thank you!

klingner commented 3 years ago

Hi, so if there is no defined format in the A2D2 dataset, then I would store the images as 1-channel 16-bit uint images (seems to be the most common way on other datasets). This is also what the KITTI validation loader uses. The error indeed looks as if the images are in a format that pil.fromarray cannot handle. Maybe a cross-check with the format of the KITTI gt at this point in the code would help.

Later on, when you convert the depth gt maps to the final format, you should take care that the correct depth_mode is specified in the parameters.json of A2D2. Here, I would use 'depth_mode': 'uint_16' in case you decide for the KITTI format of storing depth images. Note, that the depth values in the stored uint16 image are 255 times larger than the actual depth values. When you load the images, the tf.ConverDepth() function will devide by 255 if 'depth_mode': 'uint_16' is specified in the parameters.json.

Hope this helps!

Ale0311 commented 3 years ago

Hello! Thanks for the detailed explanation. It indeed helped me. However, I did not recreated the gt images, but I added this line of code instead, above the line that created the exception: sample[key] = sample[key][:,:,0]. It is an easy fix, I know, but I will change it in the future.

There seems to be a problem with the depth training, though. These are the images after the evaluation:

Screenshot 2021-03-15 at 19 06 33

It seems that all the pixels have the same value. I saved the images ( rectified ones used for training ) with uint8 data type. So both my gt and training images are uint8. Could this be the problem? Or maybe because I set the stereo_T parameter to 0? And why do i need this parameter exactly? I only use images that come from a front camera. But I cannot set it to null, because it would trigger an assertion error: it has to have a value.

Thanks again!

klingner commented 3 years ago

From the information you supplied I would say that the general format, the images are stored in should not have an influence on the depth training as long as they are loaded and stored correctly. Also, the stereo_T parameter is not used during the training process on sequences, so it should also not matter, which value you set. A value of 0 is, however, not meaningful, as this would correspond to two cameras being at the exact same position.

Ale0311 commented 3 years ago

Hello,

As far as I can tell, the images are stored and loaded correctly, I really do not know what the problem might be. A good idea is to try another dataset and see if the results are the same..

I noticed something about the K matrix, though. For the kitti training, it has these values:

[0.58, 0, 0.5, 0], [0, 1.92, 0.5, 0], [0, 0, 1, 0], [0, 0, 0, 1]

However, I used this one for A2D2:

[[ 825.10199941612268, 0.0, 959.46189953054625], [0.0, 824.19398027825866, 642.44891736910677 ], [ 0.0, 0.0, 1.0]]

A long shot, but could this be the problem? And if yes, what K matrix should I choose for the A2D2 dataset? Thanks!

klingner commented 3 years ago

Yes, you are right, this matrix seems to be in the wrong format for the code. The values in the first row need to be devided by the width (in pixels) of the image and the values in the second row need to be divided by the height of the image. The K matrix in this sense is stored in the format f_x per pixels and f_y per pixel. Same for the principal point of the image. When being loaded, the K matrix is scaled by the target resolution of the image in the Resize() transform.

klingner commented 3 years ago

Also, the matrix should be stored as a 4 times 4 matrix. Is this the case for A2D2 already in your code?

Ale0311 commented 3 years ago

Yes, mine was also 4x4 because I padded it with 0,0,01, both on the last row and the last column.

I get from your comments in the code that the K matrix is actually the extrinsics matrix. I restarted now the training using this matrix, that I found on the A2D2 website.

Homogeneous transformation matrix from global coordinates to view point coordinates ("extrinsic" matrix)

[[ 9.96714314e-01 8.09967396e-02 -3.24531964e-04 -1.75209613e+00] [-8.09890350e-02 9.96661051e-01 1.03694477e-02 -4.49267371e-01] [ 1.16333982e-03 -1.03090934e-02 9.99946183e-01 -9.39105431e-01] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]]

However, given that the images are 1920 × 1208, if I were to divide the intrinsics matrix to the dimensions you mentioned, the numbers won't match.

klingner commented 3 years ago

No, this is missunderstood I think. The matrix is still the intrinsics matrix and the matrix you originally had was nearly correct. The matrix you are searching for would rather look like this:

[[825/1920, 0, 959/1920, 0], [0, 824/1208, 642/1208, 0], [0,0,1,0], [0,0,0,1]]

Here, the camera intrinsics are simply devided by the resolution of the image.

Ale0311 commented 3 years ago

Ok, great, all clear now. It was a bit confusing because in the project there was this: image

And then in the previous comment you mentioned that the values come from the intrinsic values.

I will divide my values as well and I really hope it will do the trick. If not, I will try training on another dataset, and hopefully I will figure out what has been wrong with this one.

Thank you again for your patience, explanations and help! ☺️

klingner commented 3 years ago

Sorry, this is clearly an error in the comments from my side. It should be Intrinsic camera matrix :)