lalalune / arcprize

34 stars 4 forks source link

Investigate padding tokens / dynamic batching #8

Closed lalalune closed 1 month ago

lalalune commented 1 month ago

Right now we are using a LOT of padding tokens. Will this make things bad? I don't know, it's pretty sparse. We could try implementing sparse attention.