Does DLN-2 require twice as much GPU memory?

microsoft / deep-language-networks

We view Large Language Models as stochastic language layers in a network, where the learnable parameters are the natural language prompts at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network - DLN

MIT License

91 stars 13 forks source link

Does DLN-2 require twice as much GPU memory? #57

Open pitilessj opened 1 month ago

pitilessj commented 1 month ago

When using the vllm of the local model to experiment with DLN-2, the error of out of memory will be reported, and the experiment can succeed with DLN-1. Is there any solution? Thanks.