We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network - DLN
This PR contains a set of small improvements and adaptations made for experiments using Phi-2 model.
Most of this code was previously reviewed and merged into the phi2 branch.
Now it is being merged into the main branch.
Loading model configs from yaml
Fix a hard-coded tokenization cleaning in dln.score
Mini-batch calls to VLLM models
Check and fail early if a request is larger than the model's maximum sequence length
v3.6 backward template
Add top-k to read_results
Add new scripts
Document the DLN components in the main README
Document and type hinting dln.datasets
TODO in a separate PR:
Organize scripts in a better way / separate folders
Document OpenAI change not allowing to request echo and logprobs at the same time
This PR contains a set of small improvements and adaptations made for experiments using Phi-2 model. Most of this code was previously reviewed and merged into the
phi2
branch. Now it is being merged into themain
branch.TODO in a separate PR:
echo
andlogprobs
at the same time