Loading and running 3 LLMs with single kernel (48 GB GPU)

nikolas-rauscher / ARDI-Scholarly-QALD

ARDI-Project Scholarly QALD Challenge 2024

GNU General Public License v3.0

2 stars 0 forks source link

Hi @nikolas-rauscher, Running 3 LLMs in a single kernel with 48 GB SPU might not possible, since Llama will have already allocated most of the GPU memories, so could you please change it for a single language model for each run? I can change the model name and prompt template from config ini and also created another issue to fix some parameters. Thanks.

https://github.com/nikolas-rauscher/ARDI-Scholarly-QALD/blob/a6b6691cf3a15fd3f3f67136fd9f06d6302573b1/src/models/zero_shot_prompting_pipeline.py#L185-L197

nikolas-rauscher / ARDI-Scholarly-QALD

Loading and running 3 LLMs with single kernel (48 GB GPU) #36