Closed lalalune closed 1 month ago
Right now we are using a LOT of padding tokens. Will this make things bad? I don't know, it's pretty sparse. We could try implementing sparse attention.
Right now we are using a LOT of padding tokens. Will this make things bad? I don't know, it's pretty sparse. We could try implementing sparse attention.