Retrain with new images

timday23 commented 11 months ago

How can I correctly upload images to retrain the model? The new mask images are binary 1 bit images, and they are all the same dimensions as the original training images, but when I try to use my new training/validation in the training demo, I get an index error saying: IndexError: index 2 is out of bounds for dimension 0 with size 2.

I have followed the structure for the image file paths and names, so I don't know why I am getting this error.

maxfrei750 commented 11 months ago

It's hard to diagnose this without the data you're using, but dimension 0 should be the channel dimension. Could you please check, how many dimensions your images have? They might be binary, but still have 3 dimensions (i.e. be RGB images).

timday23 commented 11 months ago

Do you mean whether they are 1 bit or 8 bit or 24 bit?

timday23 commented 11 months ago

Right now, the full images are 8 bit grayscale and the mask images are 1 bit B/W

maxfrei750 commented 11 months ago

Please post the complete error stack that you receive.

timday23 commented 11 months ago

My methodology

Generate images in blender (1024x768, 96dpi 8 bit, 256c)
Generate masks in blender (1024x768, 96dpi 8 bit, 256c)
Convert masks to binary imageswith python script(1 bit, 2 c)
split up and upload images to "train2" and "valid2" folders
upload corresponding binary masks to "particle" folder within "train2" and "valid2"
change config to train_subset = train2 and val_subset = train2
Run 02_train_model in jupyter notebook

Am I missing something in these steps?

Here is the complete error stack:

IndexError Traceback (most recent call last) File ~/Desktop/paddle/train_model.py:4 1 from paddle.training import train_mask_rcnn 3 if name == "main": ----> 4 train_mask_rcnn()

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/main.py:33, in main..main_decorator..decorated_main(cfg_passthrough) 30 args = get_args_parser() 31 # no return value from run_hydra() as it may sometime actually run the task_function 32 # multiple times (--multirun) ---> 33 _run_hydra( 34 args_parser=args, 35 task_function=task_function, 36 config_path=config_path, 37 config_name=config_name, 38 strict=strict, 39 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:364, in _run_hydra(args_parser, task_function, config_path, config_name, strict) 362 args.run = True 363 if args.run: --> 364 run_and_report( 365 lambda: hydra.run( 366 config_name=config_name, 367 task_function=task_function, 368 overrides=args.overrides, 369 ) 370 ) 371 elif args.multirun: 372 run_and_report( 373 lambda: hydra.multirun( 374 config_name=config_name, (...) 377 ) 378 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:215, in run_and_report(func) 213 except Exception as ex: 214 if _is_env_set("HYDRA_FULL_ERROR") or is_under_debugger(): --> 215 raise ex 216 else: 217 if isinstance(ex, CompactHydraException):

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:212, in run_and_report(func) 210 def run_and_report(func: Any) -> Any: 211 try: --> 212 return func() 213 except Exception as ex: 214 if _is_env_set("HYDRA_FULL_ERROR") or is_under_debugger():

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:365, in _run_hydra..() 362 args.run = True 363 if args.run: 364 run_and_report( --> 365 lambda: hydra.run( 366 config_name=config_name, 367 task_function=task_function, 368 overrides=args.overrides, 369 ) 370 ) 371 elif args.multirun: 372 run_and_report( 373 lambda: hydra.multirun( 374 config_name=config_name, (...) 377 ) 378 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/hydra.py:109, in Hydra.run(self, config_name, task_function, overrides, with_log_configuration) 101 cfg = self.compose_config( 102 config_name=config_name, 103 overrides=overrides, 104 with_log_configuration=with_log_configuration, 105 run_mode=RunMode.RUN, 106 ) 107 HydraConfig.instance().set_config(cfg) --> 109 return run_job( 110 config=cfg, 111 task_function=task_function, 112 job_dir_key="hydra.run.dir", 113 job_subdir_key=None, 114 configure_logging=with_log_configuration, 115 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/core/utils.py:129, in run_job(config, task_function, job_dir_key, job_subdir_key, configure_logging) 126 _save_config(config.hydra.overrides.task, "overrides.yaml", hydra_output) 128 with env_override(hydra_cfg.hydra.job.env_set): --> 129 ret.return_value = task_function(task_cfg) 130 ret.task_name = JobRuntime.instance().get("name") 132 _flush_loggers()

File ~/Desktop/paddle/paddle/training.py:70, in train_mask_rcnn(config) 57 callbacks = [ 58 ModelCheckpoint(config.callbacks.model_checkpoint), 59 EarlyStopping(config.callbacks.early_stopping), 60 LearningRateMonitor(), 61 ExampleDetectionMonitor(config.callbacks.example_detection_monitor), 62 ] 64 trainer = Trainer( 65 callbacks=callbacks, 66 logger=TensorBoardLogger(save_dir=str(log_root), name="", version=version), 67 config.trainer, 68 ) ---> 70 trainer.fit(model, datamodule=data_module)

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:510, in Trainer.fit(self, model, train_dataloader, val_dataloaders, datamodule) 504 # ---------------------------- 505 # TRAIN 506 # ---------------------------- 507 # hook 508 self.call_hook('on_fit_start') --> 510 results = self.accelerator_backend.train() 511 self.accelerator_backend.teardown() 513 # ---------------------------- 514 # POST-Training CLEAN UP 515 # ---------------------------- 516 # hook

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py:57, in Accelerator.train(self) 55 def train(self): 56 self.trainer.setup_trainer(self.trainer.model) ---> 57 return self.train_or_test()

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py:74, in Accelerator.train_or_test(self) 72 else: 73 self.trainer.train_loop.setup_training() ---> 74 results = self.trainer.train() 75 return results

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:532, in Trainer.train(self) 531 def train(self): --> 532 self.run_sanity_check(self.get_model()) 534 # set stage for logging 535 self.logger_connector.set_stage("train")

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:731, in Trainer.run_sanity_check(self, ref_model) 728 self.on_sanity_checkstart() 730 # run eval step --> 731 , eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches) 733 # allow no returns from eval 734 if eval_results is not None and len(eval_results) > 0: 735 # when we get a list back, used only the last item

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:644, in Trainer.run_evaluation(self, max_batches, on_epoch) 642 with self.profiler.profile("evaluation_step_and_end"): 643 output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx) --> 644 output = self.evaluation_loop.evaluation_step_end(output) 646 # hook + store predictions 647 self.evaluation_loop.on_evaluation_batch_end(output, batch, batch_idx, dataloader_idx)

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py:191, in EvaluationLoop.evaluation_step_end(self, *args, kwargs) 189 output = self.trainer.call_hook('test_step_end', *args, *kwargs) 190 else: --> 191 output = self.trainer.call_hook('validation_step_end', args, kwargs) 192 return output

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:933, in Trainer.call_hook(self, hook_name, *args, *kwargs) 931 if is_overridden(hook_name, model_ref): 932 hook_fx = getattr(model_ref, hook_name) --> 933 output = hook_fx(args, **kwargs) 935 # if the PL module doesn't have the hook then call the accelator 936 # used to auto-reduce things for the user with Results obj 937 elif hasattr(self.accelerator_backend, hook_name):

File ~/Desktop/paddle/paddle/lightning_modules.py:167, in LightningMaskRCNN.validation_step_end(self, output) 162 """Calculate and log the validation_metrics. 163 164 :param output: Outputs of the validation step. 165 """ 166 for metric_name, metric in self.validation_metrics.items(): --> 167 metric(output["predictions"], output["targets"]) 169 tag = f"validation/{metric_name}" 171 if isinstance(metric, ConfusionMatrix): 172 # TODO: Replace when https://github.com/PyTorchLightning/pytorch-lightning/pull/6227 173 # has been merged.

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/torch/nn/modules/module.py:727, in Module._call_impl(self, *input, kwargs) 725 result = self._slow_forward(*input, *kwargs) 726 else: --> 727 result = self.forward(input, kwargs) 728 for hook in itertools.chain( 729 _global_forward_hooks.values(), 730 self._forward_hooks.values()): 731 hook_result = hook(self, input, result)

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py:154, in Metric.forward(self, *args, *kwargs) 152 # add current step 153 with torch.no_grad(): --> 154 self.update(args, **kwargs) 155 self._forward_cache = None 157 if self.compute_on_step:

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py:200, in Metric._wrap_update..wrapped_func(*args, kwargs) 197 @functools.wraps(update) 198 def wrapped_func(*args, *kwargs): 199 self._computed = None --> 200 return update(args, kwargs)

File ~/Desktop/paddle/paddle/metrics/confusion_matrix.py:67, in ConfusionMatrix.update(self, predictions, targets) 60 """Updates the confusion matrix based on the supplied targets and predictions. 61 62 :param predictions: List of dictionaries with prediction data, such as boxes and masks. 63 :param targets: Tuple of dictionaries with target data, such as boxes and masks. 64 """ 66 for prediction, target in zip(predictions, targets): ---> 67 confusion_matrix = self._evaluate_image(prediction, target) 69 self.confmat += confusion_matrix

File ~/Desktop/paddle/paddle/metrics/confusion_matrix.py:139, in ConfusionMatrix._evaluate_image(self, prediction, target) 137 label_pred = 0 # background class 138 for label_gt in labels_gt[~is_assigned_gt]: --> 139 confusion_matrix[label_gt, label_pred] += 1 141 return confusion_matrix

IndexError: index 2 is out of bounds for dimension 0 with size 2

maxfrei750 commented 11 months ago

Thanks for posting the stack. I think that helped a lot...

Forget what I said about the image format.

--> 139 confusion_matrix[label_gt, label_pred] += 1
141 return confusion_matrix

IndexError: index 2 is out of bounds for dimension 0 with size 2

This is during the evaluation, where the code tries to compute the confusion matrix, for the different classes. The error says that the confusion matrix was initialized with 2 classes. This is the default value. The first class is the background class (0) and the second class is the particle class (1). However, your ground truth data has more than two classes. The number of classes in the ground truth data, is defined as the number of subfolders (e.g. "particle"). Do you happen to have more than one subfolder in your "train2" and/or "valid2" folder?

maxfrei750 commented 11 months ago

While we're at it: Unfortunately, I cannot provide too much support for paddle any longer, since the respective project ended. Fortunately, there has been a lot of progress in the meantime, with regard to the usability of Mask R-CNN for custom applications. Therefore, I'd recommend making a switch to the mmdetection framework, which has a large community and therefore a much more detailed documentation.

maxfrei750 commented 11 months ago

This is a good place to get started: https://mmdetection.readthedocs.io/en/latest/user_guides/train.html#train-with-customized-datasets

I understand, if you don't want to make the switch right away, since you might already be close to getting paddle to run properly. So I can still try to help you, to sort this issue out. However, if you encounter further problems, I'd definitely advise making the switch.

timday23 commented 11 months ago

Thanks for the help, we were able to get it working.

maxfrei750 commented 11 months ago

Great! Then please give a short explanation, what caused the problem, in case that others have the same issue. :+1:

maxfrei750 / paddle

Retrain with new images #7

My methodology

Here is the complete error stack: