Closed Fiorentinar closed 2 months ago
Hello @Fiorentinar , following are some advices for fine-tuning:
Other settings no need to change.
Thank you very much for the tips you've provided. I'm particularly curious about the following two points:
In the scenario I described, would fixing the text encoder affect the fine-tuning of category labels(for those labels that are not commonly present in pre-trained dataset)?
For small object detection, do I need to modify the corresponding network structure, just similar to the changes from YOLOv8 to YOLOv8-p2 (where lower-dimensional backbone features are also fed into the neck and head)?
@Fiorentinar Regarding your question:
Thank you for your patient response; all my questions have been answered.
Thank you very much for your work!
I currently have a dataset with the following characteristics: high resolution(1920*1080), very limited data amount(~500 frames), and specific category names( just like 'bottom of trunk' or 'top of the traffic sign'). Specifically, these are all small objects for detection. It can be said that the dataset differs significantly from all the pre-trained datasets.
My question is that if I want to use OV-DINO for fine-tuning, are there any techniques or tips to help me bridge the gaps in resolution, data amount, object size, and category names (or just bridge some of these gaps)? Thanks a lot!