loudinthecloud / pytorch-ntm

Neural Turing Machines (NTM) - PyTorch Implementation
BSD 3-Clause "New" or "Revised" License
589 stars 128 forks source link

Removed the unnecessary softplus in NTMHeadBase._address_memory #6

Closed JulesGM closed 6 years ago

JulesGM commented 6 years ago

Removed the softplus in the softmax:

        s = F.softmax(F.softplus(s), dim=1)

softmax already constrains the values to (0, 1), the softplus doesn't achieve anything. Pytorch's softmax implementation is already numerically stable, so that's not the preoccupation.

loudinthecloud commented 6 years ago

Makes sense, thanks for that. Can you please run the copy-task notebook and see we're getting the same results?

JulesGM commented 6 years ago

I trained a bunch of pretty long models, and get good results in the notebooks.

JulesGM commented 6 years ago

Like this one, which was trained for a while on sequences up to 120 long, and converges very sharply

copy-train-120

loudinthecloud commented 6 years ago

Tested it as well, seems to alter convergence a bit but perhaps for the better.