Open Hanqer opened 3 years ago
It seems that MHSA only has one head in the released code. But in the paper, 4 heads are used in MHSA. Is it a simplification for CIFAR dataset?
It seems that MHSA only has one head in the released code. But in the paper, 4 heads are used in MHSA. Is it a simplification for CIFAR dataset?