neel04 / ReAct

Exploration with adding recurrent priors to attention based models
3 stars 0 forks source link

Merging ALiBi into Main #1

Closed neel04 closed 1 year ago

neel04 commented 1 year ago

ALiBi + FP16/BF16 w/ AMP. Requires more experiments to confirm ALiBi performance - current results indicate convergence but difficulty in generalizing and learning without positional encodings