How to Set Up Multi-Input for Object Detection Networks in MMDetection3

Hello, I'm currently trying to use MMDetection3 (mmdet) to implement my object detection network. This network has two input branches, so its backbone will have two inputs: img and the enhanced image img_retinex. I want to achieve this functionality, but I've found that MMDet3 handles only one output inputs at all stages, and by default, it supports only single inputs. I've tried modifying the code, but the forward function is implemented deep within the mmengine, and it seems the entire framework isn't very accommodating to this. Do I need to change the entire architecture? Could you offer any reasonable suggestions? In MMDet2, I easily achieved this by specifying keys (dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'img_retinex', 'gt_bboxes', 'gt_labels'])), but it seems MMDet3 no longer supports this functionality. Should I pack {img, img_retinex} into inputs and then use MMDetection3's provided functionality, or should I pack img into inputs and modify the functions to add an input? I'm looking for a simpler approach.

It seems that MMDetection3 only supports a single inputs in all its components. Could you provide some reasonable suggestions?

open-mmlab / mmdetection

How to Set Up Multi-Input for Object Detection Networks in MMDetection3 #12043