Closed zhangzhili1112 closed 2 years ago
Mix maxpool1D and Conv1D as downsampling operator, following CSP convolution. Although the accuracy of actionformer used in my dataset has been improved a lot(thanks you again for the great work), it is not enough, so I want to add some other tricks to improve the accuracy.
Hi, thank you for your interest in our ActionFormer.
As you can see, we have supported the traditional FPN structure in the neck part. However, we find that 1D FPN does not boost the performance. Some other papers also have this observations. So simply modifing FPN structures may not improve the accuracy. If you want to improve the accuracy, a good choice is to decrease the temproal stride (use more dense features) or switching the backbone, using the very recent Transformer backbone should help here.
@tzzcl Thanks for your valuable suggestion.
Closed due to inactivity.
Hi! thanks for your great work,I added BiFPN to the network, but the performance dropped. It was tested based on my own dataset. Have you ever tested adding a pyramid structure similar to BiFPN?