Closed PkuRainBow closed 3 years ago
Thanks for your interest in our work. I think my implementation of the h_swish activation and torch.nn.Hardswish represent the same function. You can check carefully by considering every piece of the function and see whether they are exactly the same.
As for the last activation function in SE module, I don't think h_sigmoid and nn.Sigmoid could lead to significantly different results. But you are welcome to try that.
Really nice work!
I am impressed by the reported results in Table-2 as following:
Then I carefully check your implementation of the h_swish function:
https://github.com/ofsoundof/LocalViT/blob/b3bd9c49e338c591bc1909a5bd64aeb9fc82a010/models/localvit.py#L16-L31
And the above implementation is different from the official implementation of torch.nn.Hardswish is shown as below:
It would be great if you could share with your suggestion on this problem.
Besides, we also find that you apply a h_sigmoid instead of nn.Sigmoid in the SE implementation:
https://github.com/ofsoundof/LocalViT/blob/b3bd9c49e338c591bc1909a5bd64aeb9fc82a010/models/localvit.py#L55-L64
Hope to hear your comments on how such a difference influences the final performance?