jthickstun / anticipation

Anticipatory Autoregressive Models
Apache License 2.0
150 stars 28 forks source link

[Feature request] Acceleration #15

Open olegchomp opened 7 months ago

olegchomp commented 7 months ago

Thank you for great repo! Will be great to have somekind of acceleration for ex. TensorRT.

jthickstun commented 7 months ago

I have no immediate plans to do this myself, but if someone wants to submit a pull request I'd be happy to work together to support faster inference.

I have been experimenting with tranformer kv caching on this branch which does speed things up for longer generations. I haven't merged this into main yet because I need to find time to do more thorough testing (warning: there could be bugs).

Lion-Mod commented 7 months ago

@jthickstun have you considered quantisation of the model for a perf improvement? I suppose the main downside is the output results may get a little dicey