microsoft / deep-language-networks

We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network - DLN
MIT License
91 stars 13 forks source link

Phi-2 #48

Closed matheper closed 7 months ago

matheper commented 7 months ago

This PR contains a set of small improvements and adaptations made for experiments using Phi-2 model. Most of this code was previously reviewed and merged into the phi2 branch. Now it is being merged into the main branch.

TODO in a separate PR:

matheper commented 7 months ago

Thanks for re-reviewing everything @MarcCote!