Open nenkoru opened 1 year ago
It was worth mentioning that I meant loading pytorch bindings. With #19 merged it loads 7b on my machine for exactly 5-6s. Haven't tried with 14b yet. As well as that there is a #14 PR ongoing which is supposed to give a lot of boost in terms of loading time of a model. Caveat is that on AMD for some reason mmaping doesn't go well.
Currently, 7b and 14b models take 10s and 15s respectively to load. Pretty much the same as a vanilla rwkv does. It would a great thing to make those models to load as fast as possible which could lead to great inference capabilities.
I guess the best milestone to begin with could be a half of those. So 5s and ~7s respectively.