Open ThanhNhann opened 4 years ago
I think the reason is that while doing crowd counting, we do not need deep features which contains semantic information. These semantic information might influence the performance since we mainly need shallower feature like edges.
@doubbblek Do you have a paper relevant mention about this? thanks for your answer
I have read your paper and don't understand why you use the first ten layers of VGG-16 with only three pooling layers instead of all architecture pre-trained model VGG16 ? Thanks