Open flywheel1412 opened 1 year ago
You got the point. I will give it a try when I get my server back:)
i found that many crowd counting network using VGG16 as the backbone to extract the feature, maybe this task rely on the low level feature and the deeply and powerful backbone will not work well (i tried some large backbones), maybe shallow and wide backbone networks and powerful FPN, i'm trying for this hypothesis, hope it's helpful for u.
i think the official network's head networks using P3 feature map, and you's using P4 feature map, this maby degrade the precision