rai-llc / LanguageModels.jl

Load nanoGPT-style transformers in Julia. Code ported from @karpathy's llama2.c
MIT License
59 stars 2 forks source link

Support sharded pytorch models #3

Open jiahao opened 1 year ago

jiahao commented 1 year ago

The current pytorch loader doesn't work on multifile models, which appear to be sharded by rows. For example, in the llama2-13b model directory, w2[1:2560,:] is in consolidated.00.pth and w2[2561:5120,:] is in consolidated.01.pth.