randaller / llama-chat

Chat with Meta's LLaMA models at home made easy
GNU General Public License v3.0
834 stars 118 forks source link

How should I use multiple GPUs? #15

Open Chting opened 1 year ago

Chting commented 1 year ago

I'm testing 65B. One A100 is too slow. I want to use two or four

randaller commented 1 year ago

The original Meta's repo works with A100s as well.

Chting commented 1 year ago

The original Meta's repo works with A100s as well.

It stipulates that 65B must use 8GPU

felipemeres commented 1 year ago

I'm also trying to figure out how to run with 2 gpus

wgimperial commented 1 year ago

The original Meta's repo works with A100s as well.

how to run example-chat.py with 8A100?

Chting commented 1 year ago

I'm also trying to figure out how to run with 2 gpus

If you succeed, please tell me

bitRAKE commented 1 year ago

Reshard the model for the number of GPUs you have: https://gist.github.com/benob/4850a0210b01672175942203aa36d300

Resharding the larger models to one file can improve load times in general.

randaller commented 1 year ago

@Chting @wgimperial @fmeres now you may try to run HF version on a more than one GPU's.

Chting commented 1 year ago

@Chting @wgimperial @fmeres now you may try to run HF version on a more than one GPU's.

Thank you very much