lufficc / SSD

High quality, fast, modular reference implementation of SSD in PyTorch
MIT License
1.52k stars 384 forks source link

Training COCO on subset #134

Closed giuordy closed 4 years ago

giuordy commented 4 years ago

Hi, I would like to train model on a subset of COCO, like using less classes (e.g. 4) instead of all classes. I modified the init code of COCO like that:

from pycocotools.coco import COCO self.coco = COCO(ann_file) self.data_dir = data_dir self.transform = transform self.target_transform = target_transform self.remove_empty = remove_empty if class_ids: self.ids = [] for id in class_ids: self.ids.extend(list(self.coco.getImgIds(catIds=[id])))

Remove duplicates

        self.ids = list(set(self.ids))
    else:
        if self.remove_empty:
            # when training, images without annotations are removed.
            self.ids = list(self.coco.imgToAnns.keys())
        else:
            # when testing, all images used.
            self.ids = list(self.coco.imgs.keys())
    print(self.ids)
    if class_ids:
        coco_categories = class_ids
    else:
        coco_categories = sorted(self.coco.getCatIds())
    self.coco_id_to_contiguous_id = {coco_id: i + 1 for i, coco_id in enumerate(coco_categories)}
    self.contiguous_id_to_coco_id = {v: k for k, v in self.coco_id_to_contiguous_id.items()}

When I try to train I have this error:

Traceback (most recent call last): File "train.py", line 118, in main() File "train.py", line 109, in main model = train(cfg, args, class_ids=[1, 3, 62, 67]) File "train.py", line 48, in train model = do_train(cfg, model, train_loader, optimizer, scheduler, checkpointer, device, arguments, args) File "F:\GIORDANO\SSD\SSD\ssd\engine\trainer.py", line 74, in dotrain for iteration, (images, targets, ) in enumerate(data_loader, start_iter): File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next data = self._next_data() File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data return self._process_data(data) File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data data.reraise() File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch_utils.py", line 394, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\Tesista\Miniconda3\envs\SSD\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "F:\GIORDANO\SSD\SSD\ssd\data\datasets\coco.py", line 60, in getitem boxes, labels = self._get_annotation(image_id) File "F:\GIORDANO\SSD\SSD\ssd\data\datasets\coco.py", line 88, in _get_annotation labels = np.array([self.coco_id_to_contiguous_id[obj["category_id"]] for obj in ann], np.int64).reshape((-1,)) File "F:\GIORDANO\SSD\SSD\ssd\data\datasets\coco.py", line 88, in labels = np.array([self.coco_id_to_contiguous_id[obj["category_id"]] for obj in ann], np.int64).reshape((-1,)) KeyError: 49

Can someone help me?

rebeen commented 4 years ago

Hello sorry have you found a solution to this ?

Thank you

lufficc commented 4 years ago

You get unexpected category 49. I think images containing category 1, 3, 62, 67 also include other categories. You can filter them out when loading annotation: https://github.com/lufficc/SSD/blob/master/ssd/data/datasets/coco.py#L65-L76