How should I use multiple GPUs？

randaller / llama-chat

Chat with Meta's LLaMA models at home made easy

GNU General Public License v3.0

834 stars 118 forks source link

Open Chting opened 1 year ago

Chting commented 1 year ago

I'm testing 65B. One A100 is too slow. I want to use two or four

randaller commented 1 year ago

The original Meta's repo works with A100s as well.

Chting commented 1 year ago

The original Meta's repo works with A100s as well.

It stipulates that 65B must use 8GPU

felipemeres commented 1 year ago

I'm also trying to figure out how to run with 2 gpus

wgimperial commented 1 year ago

The original Meta's repo works with A100s as well.

how to run example-chat.py with 8A100?

Chting commented 1 year ago

I'm also trying to figure out how to run with 2 gpus

If you succeed, please tell me

bitRAKE commented 1 year ago

Resharding the larger models to one file can improve load times in general.

randaller commented 1 year ago

@Chting @wgimperial @fmeres now you may try to run HF version on a more than one GPU's.

Chting commented 1 year ago

@Chting @wgimperial @fmeres now you may try to run HF version on a more than one GPU's.

Thank you very much