lucidrains / perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
MIT License
1.09k stars 133 forks source link

network can't train when incorporate this #13

Open abeyang00 opened 3 years ago

abeyang00 commented 3 years ago

i have added perceiver to my current network and it seems like network can't be trained. AP is zero all the way and doesn't train at all.

Does the code need to be changed in order to incorporate into another network?

clementpoiret commented 3 years ago

I saw the same problem. In fact, it doesn't work well in FP16, I'm getting NaNs really quick (generally at epoch 2). Maybe try FP32? Sometimes it doesn't converge too. Here is my code: https://github.com/clementpoiret/Perceiver_MNIST

amqdn commented 3 years ago

@clementpoiret I took a quick look at your repo: Are you trying to classify MNIST?

Having not used it myself yet, I think the user needs to specify the objective by adding a head to the Perceiver (e.g., a classifier head).

amqdn commented 3 years ago

@clementpoiret

Never mind. I see in the code now that to_logits includes a Linear layer to num_classes, and that you've also included that in your code. Huh.

clementpoiret commented 3 years ago

Yes you're right, I tried this quickly. But it's pretty slow to converge, and sometimes it doesn't even learn at all

OctoberKat commented 3 years ago

Maybe you should try warmup learning rate sceduler? Transformer is particularly sensitive to learning rate scheme.