Error when starting training on custom dataset

MathijsNL commented 2 years ago

When using a custom coco formatted dataset, after processing with process.sh, this error gets thrown:

[06/27 11:14:57 d2.engine.train_loop]: Starting training from iteration 0
[06/27 11:15:01 d2.engine.hooks]: Total training time: 0:00:04 (0:00:00 on hooks)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [23], in <module>
      3 trainer = Trainer(cfg)
      4 trainer.resume_or_load(resume=False)
----> 5 trainer.train()

File c:\users\user\BCNet\detectron2\engine\defaults.py:373, in DefaultTrainer.train(self)
    366 def train(self):
    367     """
    368     Run training.
    369 
    370     Returns:
    371         OrderedDict of results, if evaluation is enabled. Otherwise None.
    372     """
--> 373     super().train(self.start_iter, self.max_iter)
    374     if hasattr(self, "_last_eval_results") and comm.is_main_process():
    375         verify_results(self.cfg, self._last_eval_results)

File c:\users\user\BCNet\detectron2\engine\train_loop.py:131, in TrainerBase.train(self, start_iter, max_iter)
    129     for self.iter in range(start_iter, max_iter):
    130         self.before_step()
--> 131         self.run_step()
    132         self.after_step()
    133 finally:

File c:\users\user\BCNet\detectron2\engine\train_loop.py:205, in SimpleTrainer.run_step(self)
    201 start = time.perf_counter()
    202 """
    203 If your want to do something with the data, you can wrap the dataloader.
    204 """
--> 205 data = next(self._data_loader_iter)
    206 data_time = time.perf_counter() - start
    208 """
    209 If your want to do something with the losses, you can wrap the model.
    210 """

File c:\users\user\BCNet\detectron2\data\common.py:139, in AspectRatioGroupedDataset.__iter__(self)
    138 def __iter__(self):
--> 139     for d in self.dataset:
    140         w, h = d["width"], d["height"]
    141         bucket_id = 0 if w > h else 1

File c:\pyenvs\bcenv\lib\site-packages\torch\utils\data\dataloader.py:521, in _BaseDataLoaderIter.__next__(self)
    519 if self._sampler_iter is None:
    520     self._reset()
--> 521 data = self._next_data()
    522 self._num_yielded += 1
    523 if self._dataset_kind == _DatasetKind.Iterable and \
    524         self._IterableDataset_len_called is not None and \
    525         self._num_yielded > self._IterableDataset_len_called:

File c:\pyenvs\bcenv\lib\site-packages\torch\utils\data\dataloader.py:1203, in _MultiProcessingDataLoaderIter._next_data(self)
   1201 else:
   1202     del self._task_info[idx]
-> 1203     return self._process_data(data)

File c:\pyenvs\bcenv\lib\site-packages\torch\utils\data\dataloader.py:1229, in _MultiProcessingDataLoaderIter._process_data(self, data)
   1227 self._try_put_index()
   1228 if isinstance(data, ExceptionWrapper):
-> 1229     data.reraise()
   1230 return data

File c:\pyenvs\bcenv\lib\site-packages\torch\_utils.py:434, in ExceptionWrapper.reraise(self)
    430 except TypeError:
    431     # If the exception takes multiple arguments, don't try to
    432     # instantiate since we don't know how to
    433     raise RuntimeError(msg) from None
--> 434 raise exception

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "c:\pyenvs\bcenv\lib\site-packages\torch\utils\data\_utils\worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "c:\pyenvs\bcenv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "c:\pyenvs\bcenv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "c:\users\user\BCNet\detectron2\data\common.py", line 40, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "c:\users\user\BCNet\detectron2\data\dataset_mapper.py", line 133, in __call__
    instances = utils.annotations_to_instances(
  File "c:\users\user\BCNet\detectron2\data\detection_utils.py", line 277, in annotations_to_instances
    bo_segms = [obj["bg_object_segmentation"] for obj in annos]
  File "c:\users\user\BCNet\detectron2\data\detection_utils.py", line 277, in <listcomp>
    bo_segms = [obj["bg_object_segmentation"] for obj in annos]
KeyError: 'bg_object_segmentation'

Are there any additional steps needed to train this on a custom dataset? I verified that the processing went correct and each instance seems to have the bg_object_segmentation added, although some are empty.

Some examples of the bg_object_segmentation:

"bg_object_segmentation": []
"bg_object_segmentation": [[735.0, 837.0, 734.0, 827.0, 734.0, 817.0, 734.0, 807.0...

This could maybe related to the register_coco_instances I use to register the dataset?

from detectron2.data.datasets import register_coco_instances

lkeab commented 2 years ago

This error indicates that some annotation doesnot have the "bg_object_segmentation'" mask annotation. Even sometimes it's empty, we still need such key name in the produced annotation.

MathijsNL commented 2 years ago

I checked the dataset, every annotation has the bg_object_segmentation. Replacing the coco json with my own annotation file and using the train_net.py works without any errors.

I think registering new datasets just doesn't load in the bg_object_segmentation (even though it is 100% there for all annotations), but that might be out of scope for this repo and has to be added in detectron2.

lkeab / BCNet

Error when starting training on custom dataset #116