ofsoundof / LocalViT

MIT License
113 stars 14 forks source link

Ask about the difference between "h_swish" and "torch.nn.Hardswish" #2

Closed PkuRainBow closed 3 years ago

PkuRainBow commented 3 years ago

Really nice work!

I am impressed by the reported results in Table-2 as following:

image

Then I carefully check your implementation of the h_swish function:

https://github.com/ofsoundof/LocalViT/blob/b3bd9c49e338c591bc1909a5bd64aeb9fc82a010/models/localvit.py#L16-L31

And the above implementation is different from the official implementation of torch.nn.Hardswish is shown as below:

image

It would be great if you could share with your suggestion on this problem.

Besides, we also find that you apply a h_sigmoid instead of nn.Sigmoid in the SE implementation:

https://github.com/ofsoundof/LocalViT/blob/b3bd9c49e338c591bc1909a5bd64aeb9fc82a010/models/localvit.py#L55-L64

Hope to hear your comments on how such a difference influences the final performance?

ofsoundof commented 3 years ago

Thanks for your interest in our work. I think my implementation of the h_swish activation and torch.nn.Hardswish represent the same function. You can check carefully by considering every piece of the function and see whether they are exactly the same.

As for the last activation function in SE module, I don't think h_sigmoid and nn.Sigmoid could lead to significantly different results. But you are welcome to try that.