Use correct default feedforward dropout

lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

MIT License

2.37k stars 255 forks source link

Use correct default feedforward dropout #95

Closed LWprogramming closed 1 year ago

lucidrains commented 1 year ago

@LWprogramming oh, so the latest transformers literature actually finds dropout to be not that useful past a certain scale, which is why i keep those at 0, but still have the logic in there in case some traditionalists want to turn it on

i've already incorporated the best kind of structural dropout for autoregressive transformers!

lucidrains commented 1 year ago

thanks for the PR regardless!

turian commented 1 year ago

"Turn on, tune in, drop out" I guess is a bit old, but some of the wisdom still remains. Perhaps we ML practitioners should "Turn on, tune in, and drop out dropout"?