Closed sharkDDD closed 1 week ago
Hello, thanks for your interest.
You can try SAM or CLIP as backbone, which are more powerful.
Also, self-supervised pre-training (like MAE, MoCo) the backbone on some industrial dataset can be a possible solution.
Thank you very much for your work. I have a question I would like to consult:
In the paper, it is mentioned that features are extracted using a backbone pre-trained on ImageNet, which are then used as input for UniAD. In my application scenario, due to the significant distribution difference between the industrial dataset and the ImageNet image dataset, the features extracted using the ImageNet pre-trained model are very similar and almost indistinguishable. What is your understanding of this issue?