Data Preparation for COCO Dataset

chentao2016 commented 4 years ago

Thanks for your great work!

Could you give me some instructions about the Data Preparation for COCO Dataset?

kaixin96 commented 4 years ago

Hi @chentao2016 , COCO already has the information we need (e.g. ids of all images that contain a certain class), so no preparation is needed. All preprocessing is handled by COCO Dataset class.

Thank you.

chentao2016 commented 4 years ago

Thank you for your reply. When I train and test for coco dataset, sometimes I encounter the following error ("OSError: image file is truncated" ) suddenly. As shown below, it happens during the training after step 23300.

step 23200: loss: 0.2685678899172565, align_loss: 0.18199997818072525 step 23300: loss: 0.2685404613065681, align_loss: 0.18187280085978194 ERROR - PANet - Failed after 0:43:37! Traceback (most recent calls WITHOUT Sacred internals): File "train.py", line 83, in main for i_iter, sample_batched in enumerate(trainloader): File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next return self._process_data(data) File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) OSError: Caught OSError in DataLoader worker process 0. Original Traceback (most recent call last): File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/farm/ct/new_fewshot/PANet/dataloaders/common.py", line 166, in getitem for dataset_idx, data_idx in self.indices[idx]] File "/farm/ct/new_fewshot/PANet/dataloaders/common.py", line 166, in for dataset_idx, data_idx in self.indices[idx]] File "/farm/ct/new_fewshot/PANet/dataloaders/common.py", line 197, in getitem return self.dataset[self.indices[idx]] File "/farm/ct/new_fewshot/PANet/dataloaders/coco.py", line 77, in getitem sample = self.transforms(sample) File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 70, in call img = t(img) File "/farm/ct/new_fewshot/PANet/dataloaders/transforms.py", line 49, in call img = tr_F.resize(img, self.size) File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 255, in resize return img.resize(size[::-1], interpolation) File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/PIL/Image.py", line 1886, in resize self.load() File "/farm/ct/anaconda3/envs/pt13/lib/python3.6/site-packages/PIL/ImageFile.py", line 249, in load "(%d bytes not processed)" % len(b) OSError: image file is truncated (7 bytes not processed)

kaixin96 commented 4 years ago

That could be caused by a corrupted image. You can use try/except to catch the error and get the image_id and then see if that particular image can be opened in python without error.

Here are some links I found that might be useful.

FYI, my PIL version is

PIL.VERSION=1.1.7
PIL.PILLOW_VERSION=5.4.1

Thank you.

chentao2016 commented 4 years ago

Following your guidance, I find the corrupted image that causes this error. Thanks!

I notice that COCO contains 80 classes but the id is from 1-90 (skip some id, for example,12). Therefore, I was wondering whether the following code can get the very 80 classes that you want.

'COCO': { 'all': set(range(1, 81)), 0: set(range(1, 81)) - set(range(1, 21)), 1: set(range(1, 81)) - set(range(21, 41)), 2: set(range(1, 81)) - set(range(41, 61)), 3: set(range(1, 81)) - set(range(61, 81)), }

kaixin96 commented 4 years ago

The mapping from label indices (1-80) to coco class_id (1-90) is done here.

Thank you.

chentao2016 commented 4 years ago

Ok, I get it now, thanks. Nice code!

kaixin96 / PANet

Data Preparation for COCO Dataset #13