The index exceeds the dataset itself

sunhucheng commented 1 year ago

Hi ! noahzn: I'm bothering you again! I run train.py in pycharm .The program reported an error:FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\kitti_data\2011_09_26/2011_09_26_drive_0002_sync\image_03/data\0000000077.png'

but kitti_data\2011_09_26/2011_09_26_drive_0002_sync\image_03/data just 76 images .

sometimes ,the program reported an error:FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\kitti_data\2011_09_26/2011_09_26_drive_0001_sync\image_02/data\0000000108.png'

but kitti_data\2011_09_26/2011_09_26_drive_0001_sync\image_02/data just 107 images .

The errors is all the same , so I think this is a common question, have you ever been in this situation? Do you know the reason for the error?

sunhucheng commented 1 year ago

Training model named: lite-mono Models and tensorboard events files are saved to: ./tmp Training is using: cuda Using split: eigen_zhou There are 133 training items and 30 validation items

Training epoch 0 | lr 0.000100 |lr_p 0.000100 | batch 0 | examples/s: 2.0 | loss: 0.15695 | time elapsed: 00h00m04s | time left: 00h00m00s epoch 0 | lr 0.000100 |lr_p 0.000100 | batch 5 | examples/s: 25.6 | loss: 0.14992 | time elapsed: 00h00m08s | time left: 01h38m10s epoch 0 | lr 0.000100 |lr_p 0.000100 | batch 10 | examples/s: 21.3 | loss: 0.15792 | time elapsed: 00h00m12s | time left: 01h11m14s Traceback (most recent call last): File "C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\train.py", line 12, in trainer.train() File "C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\trainer.py", line 223, in train self.run_epoch() File "C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\trainer.py", line 238, in run_epoch for batch_idx, inputs in enumerate(self.train_loader): File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch\utils\data\dataloader.py", line 634, in next data = self._next_data() File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch\utils\data\dataloader.py", line 1346, in _next_data return self._process_data(data) File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch\utils\data\dataloader.py", line 1372, in _process_data data.reraise() File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch_utils.py", line 644, in reraise raise exception FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch\utils\data_utils\worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\CODE\ANACONDA\envs\facenet\lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\datasets\mono_dataset.py", line 153, in getitem inputs[("color", i, -1)] = self.get_color(folder, frame_index + i, side, do_flip) File "C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\datasets\kitti_dataset.py", line 39, in get_color color = self.loader(self.get_image_path(folder, frame_index, side)) File "C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\datasets\mono_dataset.py", line 15, in pil_loader with open(path, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\kitti_data\2011_09_26/2011_09_26_drive_0001_sync\image_02/data\0000000108.png'

When the above error message is reported, we can see that the program is still running for a while, but a similar error will be reported later

noahzn commented 1 year ago

Hi,

why do you only have 133 training items and 30 validation items? If you use a subset of KITTI please ensure that your training lists only contain these files. Besides, I notice you are using .png file to train. The default training uses .jpg files and you need to convert .png to .jpg.

sunhucheng commented 1 year ago

Hi I know what you mean, I do use the kitti data set, but my local hard drive is not enough, I want to run with a small data set first, and then I will use the full data set to run on the server later. so I only kept 2011_09_26/2011_09_26_drive_0001_sync and 2011_09_26/2011_09_26_drive_0002_sync,And I modified eigen_zhou, which only kept the contents of 2011_09_26/2011_09_26_drive_0001_sync and 2011_09_26/2011_09_26_drive_0002_sync. and the data directory format is the same as the one you sent me in the #41 issue I mentioned earlier, and in the 2011_09_26/2011_09_26_drive_0001_sync and 2011_09_26/2011_09_26_drive_0002_sync datasets I kept the data is exactly the same as the data in the complete dataset,107+76=133+30+20. 133 for training in eigen_zhou,30 for validation in eigen_zhou,20 for test in eigen. that is to say, in the complete dataset there are only 107 pictures in 2011_09_26/2011_09_26_drive_0001_sync , and only 76 pictures in 2011_09_26/2011_09_26_drive_0002_sync. So I don't think it's a problem with my dataset, is there something wrong with the code. Regarding the picture format problem you mentioned, I have set png as the default picture format in trainer.py, so this is not the problem. I don’t know if I made it clear. This problem is indeed a tricky one. It’s hard to solve without seeing the actual situation. I will continue to try.I will modify conda environment as you mentioned in #41 this issue

noahzn commented 1 year ago

The error you have encountered is due to missing data. It is not related to the code.

sunhucheng commented 1 year ago

Ok,I will run the code on the full dataset on the server,thank you for your reply

noahzn / Lite-Mono

The index exceeds the dataset itself #45