shawei3000 commented 1 year ago

I have 2 A6000 (48GB) GPUs, no nvlink, and try to fine tune 65B 4bit GPTQ llama model. ##################################################### model, tokenizer = load_llama_model_4bit_low_ram_and_offload(model_dire, model,

device_map='auto',

                                              groupsize=-1,
                                              is_v1_model=False,
                                              max_memory = {0:'43Gib', 1:'43Gib', 'cpu':'48Gib'}
                                              )

lora_config = LoraConfig( r=ft_config.lora_r, #8 lora_alpha=ft_config.lora_alpha, #16 target_modules=["q_proj", "v_proj"], lora_dropout=ft_config.lora_dropout, # 0.05 bias="none", task_type="CAUSAL_LM", ) if ft_config.lora_apply_dir is None: #None model = get_peft_model(model , lora_config) ###################################################### The problem is that the 2nd GPU is never used as I was monitoring nvidia-smi... with 1 GPU, input token max length has to be reduced to 256 for a small customized lora training dataset... is any reason why the 2nd GPU is never used in finetuning, in the above coding/setting?

rakovskij-stanislav commented 1 year ago

Hello, @shawei3000. I have 2xA4000 now, and can suggest to decrease max_memory for each gpu on about the half of model size (in my case, 7b max_memory is "2900Mib"). This way you will force layers loading to both cards

If you found some other project that has better perfomance before, let me know)

shawei3000 commented 1 year ago

thnx!

johnsmith0031 / alpaca_lora_4bit

fine tune with 2 GPU #118

device_map='auto',