What was the design of the multi_scale before entering the backbone?

20231211 commented 4 weeks ago

What was the design of the multi_scale before entering the backbone? It seems to work better without using multi_scale on my dataset. In addition, I would like to ask how many rounds of general training, the training dataset is obtained by slicing, there are about 16,000 pieces, when the training is 72 rounds best_stat_epoch=71, do you need to continue training? When training for 120 rounds, the AP@0.5:0.95 improved, but the AP@0.5 remained almost the same. There is also a question about the selection of test results, because EMA is used, should the test results be the last round of test results? Which PTH file should I choose as the final model result? How much is it reasonable to have about a difference in APs between epochs in the later stages of training?

20231211 commented 4 weeks ago

似乎ema没有发挥作用？存储时ModelEMA这个类的state_dict中return的是dict(module=self.module.state_dict(),...)，测试时是module=self.ema.module if self.ema else self.model，所以使用的还是self.module，并不是平均的结果？

lyuwenyu commented 3 weeks ago

测试时是module=self.ema.module if self.ema else self.model，所以使用的还是self.module，并不是平均的结果？

self.ema不是None的话测试时候也是用self.ema.module的

20231211 commented 3 weeks ago

@lyuwenyu 但self.ema.module存的还是当前epoch的self.module？似乎没发挥ema的作用？

lyuwenyu / RT-DETR

What was the design of the multi_scale before entering the backbone? #325