Training error - Githubissues

Ale0311 commented 3 years ago

Hello!

I get this error at the beginning of the first epoch:

Traceback (most recent call last): File "/home/diogene/Documents/Alexandra/SGDepth-master/train.py", line 228, in _run_epoch for batch_idx, batch in enumerate(self.train_loaders): File "/home/diogene/Documents/Alexandra/SGDepth-master/loaders/init.py", line 25, in iter for batch_idx, group in zip(length_iter, infinite_iters): File "/home/diogene/Documents/Alexandra/SGDepth-master/loaders/init.py", line 30, in _make_infinite for batch in loader: File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) AttributeError: Traceback (most recent call last): File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/basedataset.py", line 182, in getitem sample = self.load_transforms(sample) File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in call img = t(img) File "/home/diogene/Documents/Alexandra/SGDepth-master/dataloader/pt_data_loader/mytransforms.py", line 61, in call sample[key] = pil.fromarray(sample[key]) File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/PIL/Image.py", line 2642, in fromarray arr = obj.array_interface AttributeError: 'NoneType' object has no attribute 'array_interface'

This was the logged summary

_Starting initialization Loading training dataset metadata: /home/diogene/Documents/Alexandra/SGDepth-master/Dataset/kitti_zhou_split/train.json /home/diogene/Documents/Alexandra/SGDepth-master/Dataset/kitti_zhou_split/train.json

Can use 43760 images from the kitti (zhou_split) train split for depth training /home/diogene/Documents/Alexandra/SGDepth-master/Dataset/cityscapes/train.json
Can use 2975 images from the cityscapes train set for segmentation training Loading validation dataset metadata: /home/diogene/Documents/Alexandra/SGDepth-master/Dataset/kitti_kitti_split/validation.json
Can use 1159 images from the kitti (kitti_split) validation set for depth validation /home/diogene/Documents/Alexandra/SGDepth-master/Dataset/cityscapes/validation.json
Can use 500 images from the cityscapes validation set for segmentation validation Summary:
Model name: zhou_full
Logging directory: /home/diogene/Documents/Alexandra/SGDepth-master/ckpt/sgdepth_eccv_test/zhou_full
Using device: cuda (GeForce RTX 2080 Ti)_

Do you have any idea why I get this error? Thanks in advance!

klingner commented 3 years ago

Hi @Ale0311

I think the important lines are sample[key] = pil.fromarray(sample[key]) and AttributeError: 'NoneType' object has no attribute 'array_interface'. This suggests that the sample does not contain an image but a 'NoneType'-Object. I would suspect that the dataset structure is either not as described or that parts of the dataset are missing. To trace back, which image is causing this error, you could try to debug the __get_item__() function in the BaseDataset class in dataloader/pt_data_loader/basedataset.py. Here I would check, if the paths that the images are loaded from actually exists on your system.

Alternatively, you could try to train with a reduced number of loaders for depth and semantic segmentation by setting --depth-validation-loaders "", --segmentation-validation-loaders "", --depth-training-loaders "", and/or --segmentation-training-loaders "" in the argument parser. If the error dissapears, then you know that this loader/dataset caused the error.

Ale0311 commented 3 years ago

Thank you for your response.

Meanwhile I managed to fix the error. The problem was that I hadn't downloaded the gtFine folder from cityscapes dataset.

It is working now.

But just to be safe, are there any other folders I need to download for cityscapes? In the cityscapes folder I have the .json files you provided and the downloaded gtFine and leftImg8Bit folders,both with train, val and test folders inside. Do I also need the trainextra archive?

Thank you again for your help and response! 😊

klingner commented 3 years ago

That sounds good!

For Cityscapes just the 2975 images + segmentation labels are used, so you do not have to download the trainextra archive. Just make sure that if you want to run the evaluation on the KITTI 2015 dataset, that you have downloaded it and preprocessed the disparity maps to depth maps.

Otherwise, I think everything should be working without further dataset downloads.

Ale0311 commented 3 years ago

Thanks again! Yesterday I started the training script and I have just checked the results:

abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.117 & 0.883 & 4.752 & 0.193 & 0.872 & 0.959 & 0.981 \

I think everything is ok, because the results are really close to those that you presented.

ifnspaml / SGDepth

Training error #9