Open abeyang00 opened 3 years ago
I saw the same problem. In fact, it doesn't work well in FP16, I'm getting NaNs really quick (generally at epoch 2). Maybe try FP32? Sometimes it doesn't converge too. Here is my code: https://github.com/clementpoiret/Perceiver_MNIST
@clementpoiret I took a quick look at your repo: Are you trying to classify MNIST?
Having not used it myself yet, I think the user needs to specify the objective by adding a head to the Perceiver (e.g., a classifier head).
@clementpoiret
Never mind. I see in the code now that to_logits
includes a Linear layer to num_classes
, and that you've also included that in your code. Huh.
Yes you're right, I tried this quickly. But it's pretty slow to converge, and sometimes it doesn't even learn at all
Maybe you should try warmup learning rate sceduler? Transformer is particularly sensitive to learning rate scheme.
i have added perceiver to my current network and it seems like network can't be trained. AP is zero all the way and doesn't train at all.
Does the code need to be changed in order to incorporate into another network?