Open fengshenfeilian opened 5 years ago
I have just the same question.The orginal paper said ,for Mixup attention,they use the sigmoid function as the activate function, Apparently,this implementation use relu function instead.
In fact, as a freshman,I'm a little confused about the mixup_criterion function in the train_mixup.py file,could you give me some instruction about it?
Thanks for your job! I have a question about the expression of mix attention. And is conv->relu->conv->sigmoid able to represent it?