lalalune / arcprize

35 stars 4 forks source link

Disable grokfast and remove gradient checkpointing, add schedule free optimizer, add quadtree position encoder #23

Closed lalalune closed 3 months ago

lalalune commented 3 months ago

This PR removes grokfast, which was causing NaNs, and gradient checkpointing, which optimizes memory but isn't necessary.