Open tmostak opened 4 months ago
Ok so I did a bit more investigation, in particular logging max_memory
and device_map
in load_cfg_model_tokenizer
in chat.py
:
logger.info("Before load checkpoint")
with torch.device(cfg.environment._device):
model = cfg.architecture.model_class(cfg)
cfg.architecture.pretrained_weights = os.path.join(
experiment_path, "checkpoint.pth"
)
load_checkpoint(cfg, model, strict=False)
logger.info("After load checkpoint")
if device == "cpu_shard":
max_memory = get_balanced_memory(
model,
)
logger.info("Max Memory: ")
logger.info(max_memory)
device_map = infer_auto_device_map(model, max_memory=max_memory)
logger.info("Device Map: ")
logger.info(device_map)
model = dispatch_model(
model,
device_map=device_map,
2024-07-25 13:19:52,479 - INFO: Device: cpu
2024-07-25 13:19:53,310 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29])]
2024-07-25 13:19:53,621 - INFO: Before load checkpoint
2024-07-25 13:19:54,354 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29])]
2024-07-25 13:19:54,367 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id 128001.
2024-07-25 13:19:54,367 - INFO: Setting pretraining_tp of model config to 1.
2024-07-25 13:19:54,389 - INFO: Using bfloat16 for backbone
2024-07-25 14:01:00,580 - INFO: Attention implementation: sdpa
2024-07-25 14:01:00,589 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
2024-07-25 14:02:59,663 - INFO: Trainable parameters count: 6627000320
2024-07-25 14:02:59,663 - INFO: Total parameters count: 77180706816
2024-07-25 14:02:59,663 - INFO: Trainable %: 8.5863%
2024-07-25 14:05:28,328 - INFO: Weights loaded from: /home/ubuntu/h2o-llmstudio/output/user/heavyiq-llama-3-1-70b-combo-v61-5-no-cte-judge-3584-tokens-lora-r-512-a-1024-lr-1-1e-5.1/checkpoint.pth
2024-07-25 14:05:28,328 - INFO: After load checkpoint
2024-07-25 14:05:30,090 - INFO: Max Memory:
2024-07-25 14:05:30,090 - INFO: {0: 19395466089, 1: 19395466089, 2: 19395466089, 3: 19395466089, 4: 19395466089, 5: 19395466089, 6: 19395466089, 7: 84537507840, 'cpu': 1717507887104}
2024-07-25 14:05:30,536 - INFO: Device Map:
2024-07-25 14:05:30,536 - INFO: OrderedDict([('backbone.base_model.model.model.embed_tokens', 0), ('backbone.base_model.model.model.layers.0', 0), ('backbone.base_model.model.model.layers.1', 0), ('backbone.base_model.model.model.layers.2', 0), ('backbone.base_model.model.model.layers.3', 0), ('backbone.base_model.model.model.layers.4', 0), ('backbone.base_model.model.model.layers.5', 0), ('backbone.base_model.model.model.layers.6', 0), ('backbone.base_model.model.model.layers.7', 0), ('backbone.base_model.model.model.layers.8.self_attn.q_proj', 0), ('backbone.base_model.model.model.layers.8.self_attn.k_proj.base_layer', 0), ('backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_dropout', 0), ('backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_A', 0), ('backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_B.default', 1), ('backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_embedding_A', 1), ('backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_embedding_B', 1), ('backbone.base_model.model.model.layers.8.self_attn.v_proj', 1), ('backbone.base_model.model.model.layers.8.self_attn.o_proj', 1), ('backbone.base_model.model.model.layers.8.self_attn.rotary_emb', 1), ('backbone.base_model.model.model.layers.8.mlp', 1), ('backbone.base_model.model.model.layers.8.input_layernorm', 1), ('backbone.base_model.model.model.layers.8.post_attention_layernorm', 1), ('backbone.base_model.model.model.layers.9', 1), ('backbone.base_model.model.model.layers.10', 1), ('backbone.base_model.model.model.layers.11', 1), ('backbone.base_model.model.model.layers.12', 1), ('backbone.base_model.model.model.layers.13', 1), ('backbone.base_model.model.model.layers.14', 1), ('backbone.base_model.model.model.layers.15', 1), ('backbone.base_model.model.model.layers.16', 1), ('backbone.base_model.model.model.layers.17', 1), ('backbone.base_model.model.model.layers.18.self_attn', 1), ('backbone.base_model.model.model.layers.18.input_layernorm', 2), ('backbone.base_model.model.model.layers.18.post_attention_layernorm', 2), ('backbone.base_model.model.model.layers.19', 2), ('backbone.base_model.model.model.layers.20', 2), ('backbone.base_model.model.model.layers.21', 2), ('backbone.base_model.model.model.layers.22', 2), ('backbone.base_model.model.model.layers.23', 2), ('backbone.base_model.model.model.layers.24', 2), ('backbone.base_model.model.model.layers.25', 2), ('backbone.base_model.model.model.layers.26', 2), ('backbone.base_model.model.model.layers.27', 2), ('backbone.base_model.model.model.layers.28.self_attn', 2), ('backbone.base_model.model.model.layers.28.mlp.gate_proj', 2), ('backbone.base_model.model.model.layers.28.mlp.down_proj', 3), ('backbone.base_model.model.model.layers.28.mlp.act_fn', 3), ('backbone.base_model.model.model.layers.28.input_layernorm', 3), ('backbone.base_model.model.model.layers.28.post_attention_layernorm', 3), ('backbone.base_model.model.model.layers.29', 3), ('backbone.base_model.model.model.layers.30', 3), ('backbone.base_model.model.model.layers.31', 3), ('backbone.base_model.model.model.layers.32', 3), ('backbone.base_model.model.model.layers.33', 3), ('backbone.base_model.model.model.layers.34', 3), ('backbone.base_model.model.model.layers.35', 3), ('backbone.base_model.model.model.layers.36', 3), ('backbone.base_model.model.model.layers.37', 3), ('backbone.base_model.model.model.layers.38.self_attn', 3), ('backbone.base_model.model.model.layers.38.mlp.gate_proj', 3), ('backbone.base_model.model.model.layers.38.mlp.up_proj', 3), ('backbone.base_model.model.model.layers.38.mlp.act_fn', 4), ('backbone.base_model.model.model.layers.38.input_layernorm', 4), ('backbone.base_model.model.model.layers.38.post_attention_layernorm', 4), ('backbone.base_model.model.model.layers.39', 4), ('backbone.base_model.model.model.layers.40', 4), ('backbone.base_model.model.model.layers.41', 4), ('backbone.base_model.model.model.layers.42', 4), ('backbone.base_model.model.model.layers.43', 4), ('backbone.base_model.model.model.layers.44', 4), ('backbone.base_model.model.model.layers.45', 4), ('backbone.base_model.model.model.layers.46', 4), ('backbone.base_model.model.model.layers.47', 4), ('backbone.base_model.model.model.layers.48', 4), ('backbone.base_model.model.model.layers.50', 5), ('backbone.base_model.model.model.layers.51', 5), ('backbone.base_model.model.model.layers.52', 5), ('backbone.base_model.model.model.layers.53', 5), ('backbone.base_model.model.model.layers.54', 5), ('backbone.base_model.model.model.layers.55', 5), ('backbone.base_model.model.model.layers.56', 5), ('backbone.base_model.model.model.layers.57', 5), ('backbone.base_model.model.model.layers.58', 5), ('backbone.base_model.model.model.layers.59.self_attn', 5), ('backbone.base_model.model.model.layers.59.input_layernorm', 6), ('backbone.base_model.model.model.layers.59.post_attention_layernorm', 6), ('backbone.base_model.model.model.layers.60', 6), ('backbone.base_model.model.model.layers.61', 6), ('backbone.base_model.model.model.layers.62', 6), ('backbone.base_model.model.model.layers.63', 6), ('backbone.base_model.model.model.layers.64', 6), ('backbone.base_model.model.model.layers.65', 6), ('backbone.base_model.model.model.layers.66', 6), ('backbone.base_model.model.model.layers.67', 6), ('backbone.base_model.model.model.layers.68', 6), ('backbone.base_model.model.model.layers.69.self_attn', 6), ('backbone.base_model.model.model.layers.69.mlp.gate_proj', 6), ('backbone.base_model.model.model.layers.69.mlp.down_proj', 7), ('backbone.base_model.model.model.layers.69.mlp.act_fn', 7), ('backbone.base_model.model.model.layers.69.input_layernorm', 7), ('backbone.base_model.model.model.layers.69.post_attention_layernorm', 7), ('backbone.base_model.model.model.layers.70', 7), ('backbone.base_model.model.model.layers.71', 7), ('backbone.base_model.model.model.layers.72', 7), ('backbone.base_model.model.model.layers.73', 7), ('backbone.base_model.model.model.layers.74', 7), ('backbone.base_model.model.model.layers.75', 7), ('backbone.base_model.model.model.layers.76', 7), ('backbone.base_model.model.model.layers.77', 7), ('backbone.base_model.model.model.layers.78', 7), ('backbone.base_model.model.model.layers.79', 7), ('backbone.base_model.model.model.norm', 7), ('backbone.base_model.model.model.rotary_emb', 7), ('backbone.base_model.model.lm_head', 7), ('loss_fn', 7), ('perplexity', 7), ('backbone.base_model.model.model.layers.18.mlp', 2), ('backbone.base_model.model.model.layers.38.mlp.down_proj', 4), ('backbone.base_model.model.model.layers.49', 5), ('backbone.base_model.model.model.layers.28.mlp.up_proj', 3), ('backbone.base_model.model.model.layers.59.mlp', 6), ('backbone.base_model.model.model.layers.69.mlp.up_proj', 7)])
2024-07-25 14:07:02,059 - INFO: Merging LORA layers with base model.
2024-07-25 14:07:02,263 - ERROR: Unknown exception
Traceback (most recent call last):
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/handlers.py", line 358, in handle
await experiment_push_to_huggingface_dialog(q)
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/experiment.py", line 2015, in experiment_push_to_huggingface_dialog
publish_model_to_hugging_face(
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/hugging_face_utils.py", line 216, in publish_model_to_hugging_face
cfg, model, tokenizer = load_cfg_model_tokenizer(
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/chat.py", line 249, in load_cfg_model_tokenizer
model.backbone = model.backbone.merge_and_unload()
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 838, in merge_and_unload
return self._unload_and_optionally_merge(
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 457, in _unload_and_optionally_merge
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 470, in merge
delta_weight = self.get_delta_weight(active_adapter)
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 533, in get_delta_weight
output_tensor = transpose(weight_B @ weight_A, self.fan_in_fan_out) * self.scaling[adapter]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
What seems to be off is max_memory
has 19.3GB on each GPU except for GPU 7, with 84.5GB.
I'm wonder if this then messes up the peft
merge_and_unload
logic and causes tensor-device assignment to get off, causing the error?
Also note my nvidia-smi
output taken at the beginning or merge_and_unload()
, particularly that no GPU memory is assigned to GPU 7 (although not sure if this is an artifact of the GPUs being loaded up sequentially?)
(base) ubuntu@149-130-217-69:~/h2o-llmstudio/output/user/heavyiq-llama-3-1-70b-combo-v61-5-no-cte-judge-3584-tokens-lora-r-512-a-1024-lr-1-1e-5.1$ nvidia-smi
Thu Jul 25 14:06:31 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:08:00.0 Off | 0 |
| N/A 36C P0 72W / 400W | 17081MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:09:00.0 Off | 0 |
| N/A 35C P0 71W / 400W | 18695MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:0A:00.0 Off | 0 |
| N/A 34C P0 72W / 400W | 19005MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:0B:00.0 Off | 0 |
| N/A 35C P0 69W / 400W | 19005MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:0C:00.0 Off | 0 |
| N/A 34C P0 71W / 400W | 19005MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM4-80GB On | 00000000:0D:00.0 Off | 0 |
| N/A 33C P0 71W / 400W | 18859MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM4-80GB On | 00000000:0E:00.0 Off | 0 |
| N/A 34C P0 72W / 400W | 11725MiB / 81920MiB | 6% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM4-80GB On | 00000000:0F:00.0 Off | 0 |
| N/A 36C P0 71W / 400W | 427MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 17068MiB |
| 1 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 18682MiB |
| 2 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 18992MiB |
| 3 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 18992MiB |
| 4 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 18992MiB |
| 5 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 18846MiB |
| 6 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 11712MiB |
| 7 N/A N/A 959317 C ...s/h2o_llm_studio_jul_24/bin/python3 414MiB |
+---------------------------------------------------------------------------------------+
Thoughts or suggestions?
One thing I found that might relate
They basically hit the same issue, and someone noted that i have a similar error on the other model(minicpm),i change the version of deepspeed from 0.14.0 to 0.13.2. and it works
Going to try downgrading Deepspeed to see if this helps.
I tried again with deepspeed 0.13.2
and hit the same issue
2024-07-25 15:14:39,542 - INFO: Weights loaded from: /home/ubuntu/h2o-llmstudio/output/user/heavyiq-llama-3-1-70b-combo-v61-5-no-cte-judge-3584-tokens-lora-r-512-a-1024-lr-1-1e-5.1/checkpoint.pth
2024-07-25 15:16:00,567 - INFO: Merging LORA layers with base model.
2024-07-25 15:16:00,771 - ERROR: Unknown exception
Traceback (most recent call last):
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/handlers.py", line 358, in handle
await experiment_push_to_huggingface_dialog(q)
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/experiment.py", line 2012, in experiment_push_to_huggingface_dialog
publish_model_to_hugging_face(
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/hugging_face_utils.py", line 216, in publish_model_to_hugging_face
cfg, model, tokenizer = load_cfg_model_tokenizer(
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/chat.py", line 241, in load_cfg_model_tokenizer
model.backbone = model.backbone.merge_and_unload()
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 838, in merge_and_unload
return self._unload_and_optionally_merge(
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 457, in _unload_and_optionally_merge
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 470, in merge
delta_weight = self.get_delta_weight(active_adapter)
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 533, in get_delta_weight
output_tensor = transpose(weight_B @ weight_A, self.fan_in_fan_out) * self.scaling[adapter]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
2024-07-25 15:16:00,773 - INFO: {'home/gpu_stats', 'experiment/display/footer', 'experiment/display/charts/train_loss', 'experiment/display/tab', 'experiment/display/charts/validation_Perplexity', 'experiment/display/charts/validation_loss', 'home/experiments_stats', 'dataset/list', 'experiment/list', 'init_app', 'home/disk_usage', 'home/compute_stats', 'dataset/display/footer', 'experiment/display/charts/meta_lr'}
Thank you for the details. We recently upgraded deepspeed, so could indeed be an issue caused by this. I'll look into it.
@pascal-pfeiffer I wrote a quick Python script to write out the layer names per GPU, and it seems the issue might be how the LoRA layers for layer 8 are split between GPU 0 and GPU 1. Also why are there only the LoRA layers for layer 8 and not the other layers?
GPU 0:
backbone.base_model.model.model.embed_tokens
backbone.base_model.model.model.layers.0
backbone.base_model.model.model.layers.1
backbone.base_model.model.model.layers.2
backbone.base_model.model.model.layers.3
backbone.base_model.model.model.layers.4
backbone.base_model.model.model.layers.5
backbone.base_model.model.model.layers.6
backbone.base_model.model.model.layers.7
backbone.base_model.model.model.layers.8.self_attn.k_proj.base_layer
backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_A
backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_dropout
backbone.base_model.model.model.layers.8.self_attn.q_proj
GPU 1:
backbone.base_model.model.model.layers.10
backbone.base_model.model.model.layers.11
backbone.base_model.model.model.layers.12
backbone.base_model.model.model.layers.13
backbone.base_model.model.model.layers.14
backbone.base_model.model.model.layers.15
backbone.base_model.model.model.layers.16
backbone.base_model.model.model.layers.17
backbone.base_model.model.model.layers.18.self_attn
backbone.base_model.model.model.layers.8.input_layernorm
backbone.base_model.model.model.layers.8.mlp
backbone.base_model.model.model.layers.8.post_attention_layernorm
backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_B.default
backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_embedding_A
backbone.base_model.model.model.layers.8.self_attn.k_proj.lora_embedding_B
backbone.base_model.model.model.layers.8.self_attn.o_proj
backbone.base_model.model.model.layers.8.self_attn.rotary_emb
backbone.base_model.model.model.layers.8.self_attn.v_proj
backbone.base_model.model.model.layers.9
GPU 2:
backbone.base_model.model.model.layers.18.input_layernorm
backbone.base_model.model.model.layers.18.mlp
backbone.base_model.model.model.layers.18.post_attention_layernorm
backbone.base_model.model.model.layers.19
backbone.base_model.model.model.layers.20
backbone.base_model.model.model.layers.21
backbone.base_model.model.model.layers.22
backbone.base_model.model.model.layers.23
backbone.base_model.model.model.layers.24
backbone.base_model.model.model.layers.25
backbone.base_model.model.model.layers.26
backbone.base_model.model.model.layers.27
backbone.base_model.model.model.layers.28.mlp.gate_proj
backbone.base_model.model.model.layers.28.self_attn
GPU 3:
backbone.base_model.model.model.layers.28.input_layernorm
backbone.base_model.model.model.layers.28.mlp.act_fn
backbone.base_model.model.model.layers.28.mlp.down_proj
backbone.base_model.model.model.layers.28.mlp.up_proj
backbone.base_model.model.model.layers.28.post_attention_layernorm
backbone.base_model.model.model.layers.29
backbone.base_model.model.model.layers.30
backbone.base_model.model.model.layers.31
backbone.base_model.model.model.layers.32
backbone.base_model.model.model.layers.33
backbone.base_model.model.model.layers.34
backbone.base_model.model.model.layers.35
backbone.base_model.model.model.layers.36
backbone.base_model.model.model.layers.37
backbone.base_model.model.model.layers.38.mlp.gate_proj
backbone.base_model.model.model.layers.38.mlp.up_proj
backbone.base_model.model.model.layers.38.self_attn
GPU 4:
backbone.base_model.model.model.layers.38.input_layernorm
backbone.base_model.model.model.layers.38.mlp.act_fn
backbone.base_model.model.model.layers.38.mlp.down_proj
backbone.base_model.model.model.layers.38.post_attention_layernorm
backbone.base_model.model.model.layers.39
backbone.base_model.model.model.layers.40
backbone.base_model.model.model.layers.41
backbone.base_model.model.model.layers.42
backbone.base_model.model.model.layers.43
backbone.base_model.model.model.layers.44
backbone.base_model.model.model.layers.45
backbone.base_model.model.model.layers.46
backbone.base_model.model.model.layers.47
backbone.base_model.model.model.layers.48
GPU 5:
backbone.base_model.model.model.layers.49
backbone.base_model.model.model.layers.50
backbone.base_model.model.model.layers.51
backbone.base_model.model.model.layers.52
backbone.base_model.model.model.layers.53
backbone.base_model.model.model.layers.54
backbone.base_model.model.model.layers.55
backbone.base_model.model.model.layers.56
backbone.base_model.model.model.layers.57
backbone.base_model.model.model.layers.58
backbone.base_model.model.model.layers.59.self_attn
GPU 6:
backbone.base_model.model.model.layers.59.input_layernorm
backbone.base_model.model.model.layers.59.mlp
backbone.base_model.model.model.layers.59.post_attention_layernorm
backbone.base_model.model.model.layers.60
backbone.base_model.model.model.layers.61
backbone.base_model.model.model.layers.62
backbone.base_model.model.model.layers.63
backbone.base_model.model.model.layers.64
backbone.base_model.model.model.layers.65
backbone.base_model.model.model.layers.66
backbone.base_model.model.model.layers.67
backbone.base_model.model.model.layers.68
backbone.base_model.model.model.layers.69.mlp.gate_proj
backbone.base_model.model.model.layers.69.self_attn
GPU 7:
backbone.base_model.model.lm_head
backbone.base_model.model.model.layers.69.input_layernorm
backbone.base_model.model.model.layers.69.mlp.act_fn
backbone.base_model.model.model.layers.69.mlp.down_proj
backbone.base_model.model.model.layers.69.mlp.up_proj
backbone.base_model.model.model.layers.69.post_attention_layernorm
backbone.base_model.model.model.layers.70
backbone.base_model.model.model.layers.71
backbone.base_model.model.model.layers.72
backbone.base_model.model.model.layers.73
backbone.base_model.model.model.layers.74
backbone.base_model.model.model.layers.75
backbone.base_model.model.model.layers.76
backbone.base_model.model.model.layers.77
backbone.base_model.model.model.layers.78
backbone.base_model.model.model.layers.79
backbone.base_model.model.model.norm
backbone.base_model.model.model.rotary_emb
loss_fn
perplexity
Another update, I tried to upload to HuggingFace the Llama 3 (not 3.1) model that I had previously successfully uploaded, and got the same Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
error as with 3.1, suggesting it was the requirements upgrade that has changed things?
Ok so I rolled back to b4d04c057c7b0a4894d57264d2df7e219e234db2
(fix prompt separator
) and re-installed the deps (although had to use transformers 4.43.2 to accomodate the llama 3.1 rope change but had deepspeed 0.13.2), and got the same error exporting.
I then wrote a script to dump the layer names from the model... note how all the LoRA layers are there unlike in the output of device_map
Just showing the first 4 layers but you get the idea
model.embed_tokens.weight
model.layers.0.self_attn.q_proj.base_layer.weight
model.layers.0.self_attn.q_proj.lora_A.default.weight
model.layers.0.self_attn.q_proj.lora_B.default.weight
model.layers.0.self_attn.k_proj.base_layer.weight
model.layers.0.self_attn.k_proj.lora_A.default.weight
model.layers.0.self_attn.k_proj.lora_B.default.weight
model.layers.0.self_attn.v_proj.base_layer.weight
model.layers.0.self_attn.v_proj.lora_A.default.weight
model.layers.0.self_attn.v_proj.lora_B.default.weight
model.layers.0.self_attn.o_proj.base_layer.weight
model.layers.0.self_attn.o_proj.lora_A.default.weight
model.layers.0.self_attn.o_proj.lora_B.default.weight
model.layers.0.mlp.gate_proj.base_layer.weight
model.layers.0.mlp.gate_proj.lora_A.default.weight
model.layers.0.mlp.gate_proj.lora_B.default.weight
model.layers.0.mlp.up_proj.base_layer.weight
model.layers.0.mlp.up_proj.lora_A.default.weight
model.layers.0.mlp.up_proj.lora_B.default.weight
model.layers.0.mlp.down_proj.base_layer.weight
model.layers.0.mlp.down_proj.lora_A.default.weight
model.layers.0.mlp.down_proj.lora_B.default.weight
model.layers.0.input_layernorm.weight
model.layers.0.post_attention_layernorm.weight
model.layers.1.self_attn.q_proj.base_layer.weight
model.layers.1.self_attn.q_proj.lora_A.default.weight
model.layers.1.self_attn.q_proj.lora_B.default.weight
model.layers.1.self_attn.k_proj.base_layer.weight
model.layers.1.self_attn.k_proj.lora_A.default.weight
model.layers.1.self_attn.k_proj.lora_B.default.weight
model.layers.1.self_attn.v_proj.base_layer.weight
model.layers.1.self_attn.v_proj.lora_A.default.weight
model.layers.1.self_attn.v_proj.lora_B.default.weight
model.layers.1.self_attn.o_proj.base_layer.weight
model.layers.1.self_attn.o_proj.lora_A.default.weight
model.layers.1.self_attn.o_proj.lora_B.default.weight
model.layers.1.mlp.gate_proj.base_layer.weight
model.layers.1.mlp.gate_proj.lora_A.default.weight
model.layers.1.mlp.gate_proj.lora_B.default.weight
model.layers.1.mlp.up_proj.base_layer.weight
model.layers.1.mlp.up_proj.lora_A.default.weight
model.layers.1.mlp.up_proj.lora_B.default.weight
model.layers.1.mlp.down_proj.base_layer.weight
model.layers.1.mlp.down_proj.lora_A.default.weight
model.layers.1.mlp.down_proj.lora_B.default.weight
model.layers.1.input_layernorm.weight
model.layers.1.post_attention_layernorm.weight
model.layers.2.self_attn.q_proj.base_layer.weight
model.layers.2.self_attn.q_proj.lora_A.default.weight
model.layers.2.self_attn.q_proj.lora_B.default.weight
model.layers.2.self_attn.k_proj.base_layer.weight
model.layers.2.self_attn.k_proj.lora_A.default.weight
model.layers.2.self_attn.k_proj.lora_B.default.weight
model.layers.2.self_attn.v_proj.base_layer.weight
model.layers.2.self_attn.v_proj.lora_A.default.weight
model.layers.2.self_attn.v_proj.lora_B.default.weight
model.layers.2.self_attn.o_proj.base_layer.weight
model.layers.2.self_attn.o_proj.lora_A.default.weight
model.layers.2.self_attn.o_proj.lora_B.default.weight
model.layers.2.mlp.gate_proj.base_layer.weight
model.layers.2.mlp.gate_proj.lora_A.default.weight
model.layers.2.mlp.gate_proj.lora_B.default.weight
model.layers.2.mlp.up_proj.base_layer.weight
model.layers.2.mlp.up_proj.lora_A.default.weight
model.layers.2.mlp.up_proj.lora_B.default.weight
model.layers.2.mlp.down_proj.base_layer.weight
model.layers.2.mlp.down_proj.lora_A.default.weight
model.layers.2.mlp.down_proj.lora_B.default.weight
model.layers.2.input_layernorm.weight
model.layers.2.post_attention_layernorm.weight
model.layers.3.self_attn.q_proj.base_layer.weight
model.layers.3.self_attn.q_proj.lora_A.default.weight
model.layers.3.self_attn.q_proj.lora_B.default.weight
model.layers.3.self_attn.k_proj.base_layer.weight
model.layers.3.self_attn.k_proj.lora_A.default.weight
model.layers.3.self_attn.k_proj.lora_B.default.weight
model.layers.3.self_attn.v_proj.base_layer.weight
model.layers.3.self_attn.v_proj.lora_A.default.weight
model.layers.3.self_attn.v_proj.lora_B.default.weight
model.layers.3.self_attn.o_proj.base_layer.weight
model.layers.3.self_attn.o_proj.lora_A.default.weight
model.layers.3.self_attn.o_proj.lora_B.default.weight
model.layers.3.mlp.gate_proj.base_layer.weight
model.layers.3.mlp.gate_proj.lora_A.default.weight
model.layers.3.mlp.gate_proj.lora_B.default.weight
model.layers.3.mlp.up_proj.base_layer.weight
model.layers.3.mlp.up_proj.lora_A.default.weight
model.layers.3.mlp.up_proj.lora_B.default.weight
model.layers.3.mlp.down_proj.base_layer.weight
model.layers.3.mlp.down_proj.lora_A.default.weight
model.layers.3.mlp.down_proj.lora_B.default.weight
model.layers.3.input_layernorm.weight
model.layers.3.post_attention_layernorm.weight
model.layers.4.self_attn.q_proj.base_layer.weight
model.layers.4.self_attn.q_proj.lora_A.default.weight
model.layers.4.self_attn.q_proj.lora_B.default.weight
model.layers.4.self_attn.k_proj.base_layer.weight
model.layers.4.self_attn.k_proj.lora_A.default.weight
model.layers.4.self_attn.k_proj.lora_B.default.weight
model.layers.4.self_attn.v_proj.base_layer.weight
model.layers.4.self_attn.v_proj.lora_A.default.weight
model.layers.4.self_attn.v_proj.lora_B.default.weight
model.layers.4.self_attn.o_proj.base_layer.weight
model.layers.4.self_attn.o_proj.lora_A.default.weight
model.layers.4.self_attn.o_proj.lora_B.default.weight
model.layers.4.mlp.gate_proj.base_layer.weight
model.layers.4.mlp.gate_proj.lora_A.default.weight
model.layers.4.mlp.gate_proj.lora_B.default.weight
model.layers.4.mlp.up_proj.base_layer.weight
model.layers.4.mlp.up_proj.lora_A.default.weight
model.layers.4.mlp.up_proj.lora_B.default.weight
model.layers.4.mlp.down_proj.base_layer.weight
model.layers.4.mlp.down_proj.lora_A.default.weight
model.layers.4.mlp.down_proj.lora_B.default.weight
model.layers.4.input_layernorm.weight
model.layers.4.post_attention_layernorm.weight
Thank you for all the further investigations @tmostak. I am trying to reproduce the issue starting with default parameters and mostly aligning with the ones you used and the default dataset. Using the cfg below, I ran a successful training experiment and upload to Hugging Face Hub.
Everything ran on commit 87c2978698545c758b639fb83e0ceef7e43e91e5, so basically what we have in v1.9.0 release.
Could you by chance upload a reproducable config using the default dataset where you are facing the issue? Your config above for example doesn't include LoRA settings.
architecture:
backbone_dtype: bfloat16
gradient_checkpointing: true
intermediate_dropout: 0.0
pretrained: true
pretrained_weights: ''
augmentation:
neftune_noise_alpha: 0.0
random_parent_probability: 0.0
skip_parent_probability: 0.0
token_mask_probability: 0.0
dataset:
add_eos_token_to_answer: true
add_eos_token_to_prompt: true
add_eos_token_to_system: true
answer_column: output
chatbot_author: H2O.ai
chatbot_name: h2oGPT
data_sample: 0.2
data_sample_choice:
- Train
limit_chained_samples: false
mask_prompt_labels: true
only_last_answer: false
parent_id_column: None
personalize: false
prompt_column:
- instruction
prompt_column_separator: \n\n
system_column: None
text_answer_separator: <|answer|>
text_prompt_start: <|prompt|>
text_system_start: <|system|>
train_dataframe: /home/pascal/h2o-llmstudio/data/user/oasst/train_full.pq
validation_dataframe: None
validation_size: 0.01
validation_strategy: automatic
environment:
compile_model: false
deepspeed_allgather_bucket_size: 1000000
deepspeed_method: ZeRO3
deepspeed_reduce_bucket_size: 1000000
deepspeed_stage3_param_persistence_threshold: 1000000
deepspeed_stage3_prefetch_bucket_size: 1000000
find_unused_parameters: false
gpus:
- '0'
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
huggingface_branch: main
mixed_precision: false
mixed_precision_dtype: bfloat16
number_of_workers: 8
seed: -1
trust_remote_code: true
use_deepspeed: true
experiment_name: ruby-walrus
llm_backbone: meta-llama/Meta-Llama-3.1-70B
logging:
logger: None
neptune_project: ''
output_directory: /home/pascal/h2o-llmstudio/output/user/ruby-walrus/
prediction:
batch_size_inference: 0
do_sample: false
max_length_inference: 256
max_time: 0.0
metric: Perplexity
metric_gpt_model: gpt-3.5-turbo-0301
metric_gpt_template: general
min_length_inference: 2
num_beams: 1
num_history: 4
repetition_penalty: 1.0
stop_tokens: ''
temperature: 0.0
top_k: 0
top_p: 1.0
problem_type: text_causal_language_modeling
tokenizer:
add_prompt_answer_tokens: false
max_length: 8096
padding_quantile: 1.0
tokenizer_kwargs: '{"use_fast": true, "add_prefix_space": false}'
training:
attention_implementation: auto
batch_size: 2
differential_learning_rate: 1.0e-05
differential_learning_rate_layers: []
drop_last_batch: true
epochs: 1
evaluate_before_training: false
evaluation_epochs: 1.0
freeze_layers: []
grad_accumulation: 1
gradient_clip: 0.0
learning_rate: 0.0001
lora: true
lora_alpha: 16
lora_dropout: 0.05
lora_r: 4
lora_target_modules: ''
lora_unfreeze_layers: []
loss_function: TokenAveragedCrossEntropy
optimizer: AdamW
save_checkpoint: last
schedule: Cosine
train_validation_data: false
use_dora: false
warmup_epochs: 0.0
weight_decay: 0.0
Training
[2024-07-26 08:48:22,582] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
[2024-07-26 08:48:22,981] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning
[2024-07-26 08:48:22,982] [INFO] [utils.py:782:see_memory_usage] MA 16.52 GB Max_MA 24.61 GB CA 25.61 GB Max_CA 26 GB
[2024-07-26 08:48:22,983] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 44.94 GB, percent = 2.2%
[2024-07-26 08:48:23,012] [INFO] [stage3.py:130:__init__] Reduce bucket size 1000000
[2024-07-26 08:48:23,012] [INFO] [stage3.py:131:__init__] Prefetch bucket size 1000000
[2024-07-26 08:48:23,289] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-07-26 08:48:23,290] [INFO] [utils.py:782:see_memory_usage] MA 16.52 GB Max_MA 16.52 GB CA 25.61 GB Max_CA 26 GB
[2024-07-26 08:48:23,290] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 44.94 GB, percent = 2.2%
Parameter Offload: Total persistent parameters: 53092352 in 1281 params
[2024-07-26 08:48:24,352] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-07-26 08:48:24,353] [INFO] [utils.py:782:see_memory_usage] MA 16.44 GB Max_MA 16.52 GB CA 25.61 GB Max_CA 26 GB
[2024-07-26 08:48:24,353] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.17 GB, percent = 2.2%
[2024-07-26 08:48:24,530] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions
[2024-07-26 08:48:24,531] [INFO] [utils.py:782:see_memory_usage] MA 16.44 GB Max_MA 16.44 GB CA 25.61 GB Max_CA 26 GB
[2024-07-26 08:48:24,531] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.18 GB, percent = 2.2%
[2024-07-26 08:48:25,043] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 1
[2024-07-26 08:48:25,044] [INFO] [utils.py:782:see_memory_usage] MA 16.44 GB Max_MA 16.44 GB CA 16.64 GB Max_CA 26 GB
[2024-07-26 08:48:25,045] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.28 GB, percent = 2.2%
[2024-07-26 08:48:25,217] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions
[2024-07-26 08:48:25,217] [INFO] [utils.py:782:see_memory_usage] MA 16.44 GB Max_MA 16.44 GB CA 16.64 GB Max_CA 17 GB
[2024-07-26 08:48:25,218] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.28 GB, percent = 2.2%
[2024-07-26 08:48:25,420] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions
[2024-07-26 08:48:25,420] [INFO] [utils.py:782:see_memory_usage] MA 16.46 GB Max_MA 16.48 GB CA 16.64 GB Max_CA 17 GB
[2024-07-26 08:48:25,421] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.28 GB, percent = 2.2%
[2024-07-26 08:48:25,600] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2024-07-26 08:48:25,600] [INFO] [utils.py:782:see_memory_usage] MA 16.46 GB Max_MA 16.46 GB CA 16.64 GB Max_CA 17 GB
[2024-07-26 08:48:25,601] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.28 GB, percent = 2.2%
[2024-07-26 08:48:25,770] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2024-07-26 08:48:25,771] [INFO] [utils.py:782:see_memory_usage] MA 16.46 GB Max_MA 16.49 GB CA 16.64 GB Max_CA 17 GB
[2024-07-26 08:48:25,771] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.28 GB, percent = 2.2%
[2024-07-26 08:48:25,771] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized
[2024-07-26 08:48:26,387] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2024-07-26 08:48:26,388] [INFO] [utils.py:782:see_memory_usage] MA 16.48 GB Max_MA 16.48 GB CA 16.64 GB Max_CA 17 GB
[2024-07-26 08:48:26,388] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 45.3 GB, percent = 2.2%
[...]
[2024-07-26 08:48:26,397] [INFO] [config.py:1001:print] zero_enabled ................. True
[2024-07-26 08:48:26,397] [INFO] [config.py:1001:print] zero_force_ds_cpu_optimizer .. False
[2024-07-26 08:48:26,397] [INFO] [config.py:1001:print] zero_optimization_stage ...... 3
[2024-07-26 08:48:26,397] [INFO] [config.py:987:print_user_config] json = {
"fp16": {
"enabled": false,
"loss_scale_window": 100
},
"bf16": {
"enabled": true,
"loss_scale_window": 100
},
"zero_force_ds_cpu_optimizer": false,
"zero_optimization": {
"overlap_comm": true,
"contiguous_gradients": true,
"reduce_bucket_size": 1.000000e+06,
"stage": 3,
"stage3_prefetch_bucket_size": 1.000000e+06,
"stage3_param_persistence_threshold": 1.000000e+06,
"stage3_gather_16bit_weights_on_model_save": true
},
"steps_per_print": 2.000000e+03,
"train_micro_batch_size_per_gpu": 2,
"gradient_accumulation_steps": 1,
"wall_clock_breakdown": false
}
2024-07-26 08:48:26,466 - INFO: Evaluation step: 161
2024-07-26 08:48:26,553 - INFO: Evaluation step: 161
2024-07-26 08:48:26,557 - INFO: Evaluation step: 161
2024-07-26 08:48:26,582 - INFO: Evaluation step: 161
2024-07-26 08:48:26,590 - INFO: Evaluation step: 161
2024-07-26 08:48:26,615 - INFO: Evaluation step: 161
2024-07-26 08:48:26,637 - INFO: Evaluation step: 161
2024-07-26 08:48:26,675 - INFO: Training Epoch: 1 / 1
2024-07-26 08:48:26,675 - INFO: train loss: 0%| | 0/161 [00:00<?, ?it/s]
2024-07-26 08:48:26,807 - INFO: Evaluation step: 161
2024-07-26 08:48:28,215 - INFO: Stop token ids: [tensor([ 27, 91, 9399, 91, 29]), tensor([ 27, 91, 9125, 91, 29]), tensor([ 27, 91,
41681, 91, 29])]
2024-07-26 08:49:08,638 - INFO: train loss: 1.14: 5%|4 | 8/161 [00:41<13:22, 5.25s/it]
2024-07-26 08:49:23,998 - INFO: train loss: 1.14: 5%|4 | 8/161 [00:57<13:22, 5.25s/it]
2024-07-26 08:49:28,649 - INFO: train loss: 1.13: 10%|9 | 16/161 [01:01<08:46, 3.63s/it
[...]
2024-07-26 08:54:54,019 - INFO: train loss: 1.16: 84%|########4 | 136/161 [06:27<01:08, 2.75s/it]
2024-07-26 08:55:01,619 - INFO: train loss: 1.00: 89%|########9 | 144/161 [06:34<00:45, 2.69s/it]
2024-07-26 08:55:14,021 - INFO: train loss: 1.00: 89%|########9 | 144/161 [06:47<00:45, 2.69s/it]
2024-07-26 08:55:22,118 - INFO: train loss: 0.97: 94%|#########4| 152/161 [06:55<00:23, 2.65s/it]
2024-07-26 08:55:34,023 - INFO: train loss: 0.97: 94%|#########4| 152/161 [07:07<00:23, 2.65s/it]
2024-07-26 08:55:43,204 - INFO: train loss: 1.21: 99%|#########9| 160/161 [07:16<00:02, 2.65s/it]
2024-07-26 08:55:46,551 - INFO: Saving last model checkpoint to /home/pascal/h2o-llmstudio/output/user/ruby-walrus/
2024-07-26 08:55:54,024 - INFO: train loss: 1.15: 100%|##########| 161/161 [07:27<00:00, 2.65s/it]
[2024-07-26 08:56:53,065] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,065] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,066] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,065] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,066] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,066] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,066] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[2024-07-26 08:56:53,069] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step0 is about to be saved!
[2024-07-26 08:56:53,069] [INFO] [engine.py:3591:save_16bit_model] Saving model weights to /home/pascal/h2o-llmstudio/output/user/ruby-walrus/checkpoi
nt.pth, tag: global_step0
[2024-07-26 08:56:53,070] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/pascal/h2o-llmstudio/output/user/ruby-walrus/checkpoint.pth
...
[2024-07-26 08:59:22,348] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/pascal/h2o-llmstudio/output/user/ruby-walrus/checkpoint.pth.
[2024-07-26 08:59:22,349] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step0 is ready now!
[...]
2024-07-26 09:01:21,682 - INFO: Starting validation inference
2024-07-26 09:01:21,683 - INFO: validation progress: 0%| | 0/9 [00:00<?, ?it/s]
2024-07-26 09:01:24,152 - INFO: validation progress: 11%|#1 | 1/9 [00:02<00:19, 2.47s/it]
2024-07-26 09:01:26,059 - INFO: validation progress: 22%|##2 | 2/9 [00:04<00:14, 2.14s/it]
2024-07-26 09:01:26,915 - INFO: validation progress: 33%|###3 | 3/9 [00:05<00:09, 1.55s/it]
2024-07-26 09:01:27,717 - INFO: validation progress: 44%|####4 | 4/9 [00:06<00:06, 1.26s/it]
2024-07-26 09:01:28,391 - INFO: validation progress: 56%|#####5 | 5/9 [00:06<00:04, 1.05s/it]
2024-07-26 09:01:29,159 - INFO: validation progress: 67%|######6 | 6/9 [00:07<00:02, 1.05it/s]
2024-07-26 09:01:29,803 - INFO: validation progress: 78%|#######7 | 7/9 [00:08<00:01, 1.17it/s]
2024-07-26 09:01:30,438 - INFO: validation progress: 89%|########8 | 8/9 [00:08<00:00, 1.28it/s]
2024-07-26 09:01:31,069 - INFO: validation progress: 100%|##########| 9/9 [00:09<00:00, 1.36it/s]
2024-07-26 09:01:31,077 - INFO: validation progress: 100%|##########| 9/9 [00:09<00:00, 1.04s/it]
2024-07-26 09:01:31,103 - INFO: Validation Perplexity: 18.37264
2024-07-26 09:01:31,103 - INFO: Mean validation loss: 1.09179
2024-07-26 09:01:34,473 - INFO: train loss: 1.15: 100%|##########| 161/161 [13:07<00:00, 4.89s/it]
[2024-07-26 09:01:38,110] [INFO] [launch.py:351:main] Process 1912825 exits successfully.
[2024-07-26 09:01:38,111] [INFO] [launch.py:351:main] Process 1912823 exits successfully.
[2024-07-26 09:01:38,111] [INFO] [launch.py:351:main] Process 1912821 exits successfully.
[2024-07-26 09:01:39,113] [INFO] [launch.py:351:main] Process 1912822 exits successfully.
[2024-07-26 09:01:39,113] [INFO] [launch.py:351:main] Process 1912824 exits successfully.
[2024-07-26 09:01:39,113] [INFO] [launch.py:351:main] Process 1912826 exits successfully.
[2024-07-26 09:01:39,114] [INFO] [launch.py:351:main] Process 1912827 exits successfully.
[2024-07-26 09:01:41,116] [INFO] [launch.py:351:main] Process 1912820 exits successfully.
[...]
Upload with cpu_shard
2024-07-26 09:07:20,750 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id 128001.
2024-07-26 09:07:20,750 - INFO: Setting pretraining_tp of model config to 1.
2024-07-26 09:07:20,778 - INFO: Using bfloat16 for backbone
2024-07-26 09:36:05,021 - INFO: Attention implementation: sdpa
2024-07-26 09:36:05,026 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
2024-07-26 09:36:05,954 - INFO: Trainable parameters count: 51773440
2024-07-26 09:36:05,954 - INFO: Total parameters count: 70605479936
2024-07-26 09:36:05,955 - INFO: Trainable %: 0.0733%
2024-07-26 09:37:46,950 - INFO: Weights loaded from: /home/pascal/h2o-llmstudio/output/user/ruby-walrus/checkpoint.pth
2024-07-26 09:38:23,721 - INFO: Merging LORA layers with base model.
2024-07-26 09:38:24,035 - INFO: Enough space available for saving model weights.Required space: 138607.63MB, Available space: 3927968.00MB.
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/pascal/.cache/huggingface/token
Login successful
README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 7.75k/7.75k [00:00<00:00, 22.1MB/s]
model-00014-of-00030.safetensors: 4.98GB [00:09, 522MB/s]
model-00025-of-00030.safetensors: 4.67GB [00:12, 382MB/s]
model-00024-of-00030.safetensors: 4.98GB [00:11, 449MB/s]
model-00006-of-00030.safetensors: 4.67GB [00:17, 273MB/s]
model-00029-of-00030.safetensors: 4.98GB [00:10, 478MB/s]
model-00023-of-00030.safetensors: 5.01GB [00:09, 534MB/s]
model-00020-of-00030.safetensors: 4.67GB [00:09, 509MB/s]
model-00026-of-00030.safetensors: 4.67GB [00:09, 497MB/s]
model-00013-of-00030.safetensors: 5.01GB [00:09, 511MB/s]
model-00027-of-00030.safetensors: 4.67GB [00:11, 392MB/s]
model-00009-of-00030.safetensors: 4.98GB [00:10, 467MB/s] | 10/30 [01:55<03:43, 11.17s/it]
model-00003-of-00030.safetensors: 5.01GB [00:09, 533MB/s]
model-00019-of-00030.safetensors: 4.98GB [00:09, 534MB/s]
model-00021-of-00030.safetensors: 4.67GB [00:09, 498MB/s]
model-00022-of-00030.safetensors: 4.67GB [00:09, 507MB/s]
model-00030-of-00030.safetensors: 2.11GB [00:04, 510MB/s]
model-00001-of-00030.safetensors: 4.59GB [00:09, 496MB/s]
model-00011-of-00030.safetensors: 4.67GB [00:09, 505MB/s]
model-00016-of-00030.safetensors: 4.67GB [00:09, 516MB/s]
model-00015-of-00030.safetensors: 4.67GB [00:09, 510MB/s]
model-00007-of-00030.safetensors: 4.67GB [00:10, 458MB/s]
model-00028-of-00030.safetensors: 5.01GB [00:09, 532MB/s]
model-00012-of-00030.safetensors: 4.67GB [00:09, 514MB/s]
model-00005-of-00030.safetensors: 4.67GB [00:09, 501MB/s]
model-00010-of-00030.safetensors: 4.67GB [00:11, 402MB/s]
model-00004-of-00030.safetensors: 4.98GB [00:09, 499MB/s]
model-00018-of-00030.safetensors: 5.01GB [00:09, 521MB/s]
model-00002-of-00030.safetensors: 4.67GB [00:09, 492MB/s]
model-00017-of-00030.safetensors: 4.67GB [00:12, 385MB/s]
model-00008-of-00030.safetensors: 5.01GB [00:10, 499MB/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [06:50<00:00, 13.68s/it]
Memory allocation on the GPUs (yes, this indeed isn't freed but that is another issue https://github.com/h2oai/h2o-llmstudio/issues/736)
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:18:00.0 Off | 0 |
| N/A 33C P0 143W / 700W | 17520MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2A:00.0 Off | 0 |
| N/A 34C P0 133W / 700W | 18969MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:3A:00.0 Off | 0 |
| N/A 35C P0 133W / 700W | 18393MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:5D:00.0 Off | 0 |
| N/A 31C P0 126W / 700W | 18841MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:84:00.0 Off | 0 |
| N/A 32C P0 126W / 700W | 18841MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:8B:00.0 Off | 0 |
| N/A 34C P0 131W / 700W | 18969MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:91:00.0 Off | 0 |
| N/A 35C P0 134W / 700W | 18393MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 32C P0 131W / 700W | 21741MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
could some of these issues be related to this? https://github.com/huggingface/transformers/pull/32214
maybe try updating transformer
It worked for me on current main/v1.9.0, so there seems to be at least one issue that isn't easily reproducible.
Hmm... as a sanity check I started a new instance and redid the dep install, trained again and got the same issue. I should note that I did make one change to requirements.txt
to set transformers to the latest 4.43.3
version
transformers==4.43.3; python_full_version >= '3.8.0'
Full config file
architecture:
backbone_dtype: bfloat16
gradient_checkpointing: true
intermediate_dropout: 0.0
pretrained: true
pretrained_weights: ''
augmentation:
neftune_noise_alpha: 0.0
random_parent_probability: 0.0
skip_parent_probability: 0.0
token_mask_probability: 0.0
dataset:
add_eos_token_to_answer: true
add_eos_token_to_prompt: true
add_eos_token_to_system: true
answer_column: answer
chatbot_author: H2O.ai
chatbot_name: h2oGPT
data_sample: 1.0
data_sample_choice:
- Train
- Validation
limit_chained_samples: false
mask_prompt_labels: true
only_last_answer: false
parent_id_column: None
personalize: false
prompt_column:
- prompt
prompt_column_separator: \n\n
system_column: None
text_answer_separator: ''
text_prompt_start: ''
text_system_start: <|system|>
train_dataframe: /home/ubuntu/h2o-llmstudio/data/user/heavyiq_combo_v61_5_no_cte_judgements_3584_tokens_gen1/heavyiq_combo_v61_5_no_cte_judgements_3584_tokens_gen1_train.csv
validation_dataframe: /home/ubuntu/h2o-llmstudio/data/user/heavyiq_combo_v61_5_no_cte_judgements_3584_tokens_gen1/heavyiq_combo_v61_5_no_cte_judgements_3584_tokens_gen1_eval.csv
validation_size: 0.01
validation_strategy: custom
environment:
compile_model: false
deepspeed_allgather_bucket_size: 1000000
deepspeed_method: ZeRO3
deepspeed_reduce_bucket_size: 1000000
deepspeed_stage3_param_persistence_threshold: 1000000
deepspeed_stage3_prefetch_bucket_size: 1000000
find_unused_parameters: false
gpus:
- '0'
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
huggingface_branch: main
mixed_precision: false
mixed_precision_dtype: bfloat16
number_of_workers: 8
seed: 2
trust_remote_code: true
use_deepspeed: true
experiment_name: heavyai-heavyiq-llama-3.1-70b-combo-v61-5-judge-3584-tokens-lora-r-512-a-1024-lr-1-1e-5.2.1
llm_backbone: meta-llama/Meta-Llama-3.1-70B
logging:
logger: None
neptune_project: ''
output_directory: /home/ubuntu/h2o-llmstudio/output/user/heavyai-heavyiq-llama-3.1-70b-combo-v61-5-judge-3584-tokens-lora-r-512-a-1024-lr-1-1e-5.2.1/
prediction:
batch_size_inference: 0
do_sample: false
max_length_inference: 256
max_time: 0.0
metric: Perplexity
metric_gpt_model: gpt-3.5-turbo-0301
metric_gpt_template: general
min_length_inference: 768
num_beams: 1
num_history: 4
repetition_penalty: 1.0
stop_tokens: ''
temperature: 0.0
top_k: 0
top_p: 1.0
problem_type: text_causal_language_modeling
tokenizer:
add_prompt_answer_tokens: false
max_length: 4416
padding_quantile: 1.0
tokenizer_kwargs: '{"use_fast": true, "add_prefix_space": false}'
training:
attention_implementation: auto
batch_size: 1
differential_learning_rate: 1.0e-05
differential_learning_rate_layers: []
drop_last_batch: true
epochs: 1
evaluate_before_training: false
evaluation_epochs: 0.05
freeze_layers: []
grad_accumulation: 1
gradient_clip: 0.0
learning_rate: 1.2e-05
lora: true
lora_alpha: 1024
lora_dropout: 0.05
lora_r: 512
lora_target_modules: ''
lora_unfreeze_layers: []
loss_function: TokenAveragedCrossEntropy
optimizer: AdamW
save_checkpoint: last
schedule: Cosine
train_validation_data: false
use_dora: false
warmup_epochs: 0.0
I will try training with a default dataset but not sure how that would make a difference.
Ok I trained with the default dataset but set lora_r: 4
and lora_alpha: 16
per the config shared by @pascal-pfeiffer. And indeed it succesfully merged the lora and is uploading.
This makes me think there's been some regression (assuming in the underlying peft library?) that is causing issues for large LoRA layers?
Here's my cfg
(base) ubuntu@164-152-107-167:~/h2o-llmstudio/output/user$ cat llama_3.1_70b_test/cfg.yaml
architecture:
backbone_dtype: bfloat16
gradient_checkpointing: true
intermediate_dropout: 0.0
pretrained: true
pretrained_weights: ''
augmentation:
neftune_noise_alpha: 0.0
random_parent_probability: 0.0
skip_parent_probability: 0.0
token_mask_probability: 0.0
dataset:
add_eos_token_to_answer: true
add_eos_token_to_prompt: true
add_eos_token_to_system: true
answer_column: output
chatbot_author: H2O.ai
chatbot_name: h2oGPT
data_sample: 0.05
data_sample_choice:
- Train
- Validation
limit_chained_samples: false
mask_prompt_labels: true
only_last_answer: false
parent_id_column: None
personalize: false
prompt_column:
- instruction
prompt_column_separator: \n\n
system_column: None
text_answer_separator: <|answer|>
text_prompt_start: <|prompt|>
text_system_start: <|system|>
train_dataframe: /home/ubuntu/h2o-llmstudio/data/user/oasst/train_full.pq
validation_dataframe: None
validation_size: 0.02
validation_strategy: automatic
environment:
compile_model: false
deepspeed_allgather_bucket_size: 1000000
deepspeed_method: ZeRO3
deepspeed_reduce_bucket_size: 1000000
deepspeed_stage3_param_persistence_threshold: 1000000
deepspeed_stage3_prefetch_bucket_size: 1000000
find_unused_parameters: false
gpus:
- '0'
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
huggingface_branch: main
mixed_precision: false
mixed_precision_dtype: bfloat16
number_of_workers: 8
seed: 2
trust_remote_code: true
use_deepspeed: true
experiment_name: llama_3.1_70b_test
llm_backbone: meta-llama/Meta-Llama-3.1-70B
logging:
logger: None
neptune_project: ''
output_directory: /home/ubuntu/h2o-llmstudio/output/user/llama_3.1_70b_test/
prediction:
batch_size_inference: 0
do_sample: false
max_length_inference: 4096
max_time: 0.0
metric: Perplexity
metric_gpt_model: gpt-3.5-turbo-0301
metric_gpt_template: general
min_length_inference: 2
num_beams: 1
num_history: 4
repetition_penalty: 1.0
stop_tokens: ''
temperature: 0.0
top_k: 0
top_p: 1.0
problem_type: text_causal_language_modeling
tokenizer:
add_prompt_answer_tokens: false
max_length: 4864
padding_quantile: 1.0
tokenizer_kwargs: '{"use_fast": true, "add_prefix_space": false}'
training:
attention_implementation: auto
batch_size: 1
differential_learning_rate: 1.0e-05
differential_learning_rate_layers: []
drop_last_batch: true
epochs: 1
evaluate_before_training: false
evaluation_epochs: 1.0
freeze_layers: []
grad_accumulation: 1
gradient_clip: 0.0
learning_rate: 1.2e-05
lora: true
lora_alpha: 16
lora_dropout: 0.05
lora_r: 4
lora_target_modules: ''
lora_unfreeze_layers: []
loss_function: TokenAveragedCrossEntropy
optimizer: AdamW
save_checkpoint: last
schedule: Cosine
train_validation_data: false
use_dora: false
warmup_epochs: 0.0
weight_decay: 0.0
Would you guys be able to try a bigger lora (i.e. rank 512 alpha 1024) as I did to see if you can repro? I'll try some sizes between 4/16 and 512/1024 to see if I can find the breaking point.
Yes, I am starting up the 512/1024 test right now. That could indeed be an issue then. Also, why I was asking for lora settings earlier, as default settings seemed to work fine.
So, seems that very large LoRA layers are split across GPUs, while smaller ones are on a single GPU and the deepspeed wrapper isn't gathering them on a single (meta) device.
Will see how we can deal with that and if there are any workarounds such as CPU only merge.
Thanks @pascal-pfeiffer... just should note I've been training and uploading r512/a1024 models (llama 3 70b) for some months, so seems there was a recent change that has caused the issues.
Also I tried a CPU-only merge and gave up after nearly 24 hours of waiting.
Ok, to follow up on this, altering my training config from above (https://github.com/h2oai/h2o-llmstudio/issues/782#issuecomment-2258743302) to use LoRA Rank 256 and Alpha 512 worked, but when I changed it to Rank 512 and Alpha 1024 I got the failure seen before.
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/handlers.py", line 358, in handle
await experiment_push_to_huggingface_dialog(q)
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/experiment.py", line 2015, in experiment_push_to_huggingface_dialog
publish_model_to_hugging_face(
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/hugging_face_utils.py", line 216, in publish_model_to_hugging_face
cfg, model, tokenizer = load_cfg_model_tokenizer(
File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/chat.py", line 241, in load_cfg_model_tokenizer
model.backbone = model.backbone.merge_and_unload()
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 838, in merge_and_unload
return self._unload_and_optionally_merge(
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 457, in _unload_and_optionally_merge
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 470, in merge
delta_weight = self.get_delta_weight(active_adapter)
File "/home/ubuntu/miniconda3/envs/h2o_llm_studio/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 533, in get_delta_weight
output_tensor = transpose(weight_B @ weight_A, self.fan_in_fan_out) * self.scaling[adapter]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
Interesting, I used your config with 512/1024 and was able to merge and upload. Though, a bit different GPUs, so maybe it was down to luck there if the layers got split or not.
@pascal-pfeiffer would you be able to list all the versions of packages in your environment?
I checked out this commit (87c2978698545c758b639fb83e0ceef7e43e91e5) when testing and installed a fresh environment. So, https://github.com/h2oai/h2o-llmstudio/blob/87c2978698545c758b639fb83e0ceef7e43e91e5/requirements.txt
Given that this is dependent on the size of LoRA, I have the strong feeling this can be very hardware dependent.
By chance, what is the disk space left on your primary disk? I noticed that the export uses always the primary disk for an intermediate saving, which is ~170GB for this model. Could be that this also affects somehow the sharding, as you also saw unusual distribution across the 8 GPUs.
With slightly different config, I was again able to export and upload. So, hard to replicate for me now.
Latest, I did update
transformers = "==4.43.3"
accelerate = "==0.33.0"
hf-transfer = "==0.1.8"
peft = "==0.12.0"
and export was fine again. (though without HF_Transfer, the upload often fails, as you reported earlier) And setting it as an env var is required (#801).
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:18:00.0 Off | 0 |
| N/A 33C P0 144W / 700W | 20997MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2A:00.0 Off | 0 |
| N/A 35C P0 135W / 700W | 22509MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:3A:00.0 Off | 0 |
| N/A 36C P0 133W / 700W | 22509MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:5D:00.0 Off | 0 |
| N/A 32C P0 127W / 700W | 22509MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:84:00.0 Off | 0 |
| N/A 32C P0 128W / 700W | 22889MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:8B:00.0 Off | 0 |
| N/A 35C P0 132W / 700W | 22509MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:91:00.0 Off | 0 |
| N/A 36C P0 138W / 700W | 22509MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 33C P0 134W / 700W | 24513MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
I'll do some more testing with even more extreme LoRA dimensions.
I'll do some more testing with even more extreme LoRA dimensions.
1024 LoRA Rank also worked fine. Now thinking it might be something else.
2024-08-01 23:27:36,341 - INFO: Trainable parameters count: 13254000640
2024-08-01 23:27:36,342 - INFO: Total parameters count: 83807707136
2024-08-01 23:27:36,342 - INFO: Trainable %: 15.8148%
[...]
100%|███████████████████████████████████████████████████████████████████████████| 30/30 [05:20<00:00, 10
Though, that was again with the updated dependencies
transformers = "==4.43.3"
accelerate = "==0.33.0"
hf-transfer = "==0.1.8"
peft = "==0.12.0"
For 100 % reproduceability, I am on 6755a5866b598ae1832bd6e75b18e207a889bc52 (current main) and updated the dependencies as above. Attached, I have the Pipfile.lock and my train config
Just to follow up on this, as a workaround I was able to start LLM Studio with 4 GPUs via the CUDA_VISIBLE_DEVICES environment variable, and it worked fine. Still don't know why it was/is still failing with 8 GPUs, but at least I was able to export my model.
🐛 Bug
Today when attempting to upload a LoRA-trained Llama 3.1 70B model (first time I've trained Llama 3.1), I hit the following during the eLoRA merge. Note I used the
cpu_shard
method to upload. I've tried it twice now with the same error.2024-07-24 17:22:58,705 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29])] 2024-07-24 17:22:59,686 - INFO: Stop token ids: [tensor([ 27, 91, 9125, 91, 29])] 2024-07-24 17:22:59,701 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id 128001. 2024-07-24 17:22:59,701 - INFO: Setting pretraining_tp of model config to 1. 2024-07-24 17:22:59,723 - INFO: Using bfloat16 for backbone 2024/07/24 17:23:07 # {"client":"3f76ec33-3e3f-4837-9673-cda3f39f377f","state":"DISCONNECT","t":"ws_disconnect"} 2024/07/24 17:23:07 # {"addr":"99.68.143.103:49420","client_id":"3f76ec33-3e3f-4837-9673-cda3f39f377f","t":"client_reconnect"} 2024-07-24 18:04:05,704 - INFO: Attention implementation: sdpa 2024-07-24 18:04:05,713 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'] 2024-07-24 18:06:03,026 - INFO: Trainable parameters count: 6627000320 2024-07-24 18:06:03,027 - INFO: Total parameters count: 77180706816 2024-07-24 18:06:03,027 - INFO: Trainable %: 8.5863% 2024-07-24 18:08:56,811 - INFO: Weights loaded from: /home/ubuntu/h2o-llmstudio/output/user/heavyiq-llama-3-1-70b-combo-v61-5-no-cte-judge-3584-tokens-lora-r-512-a-1024-lr-1-1e-5.1/checkpoint.pth 2024-07-24 18:10:15,356 - INFO: Merging LORA layers with base model. 2024-07-24 18:10:15,561 - ERROR: Unknown exception Traceback (most recent call last): File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/handlers.py", line 358, in handle await experiment_push_to_huggingface_dialog(q) File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/experiment.py", line 2015, in experiment_push_to_huggingface_dialog publish_model_to_hugging_face( File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/hugging_face_utils.py", line 216, in publish_model_to_hugging_face cfg, model, tokenizer = load_cfg_model_tokenizer( File "/home/ubuntu/h2o-llmstudio/./llm_studio/app_utils/sections/chat.py", line 241, in load_cfg_model_tokenizer model.backbone = model.backbone.merge_and_unload() File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 838, in merge_and_unload return self._unload_and_optionally_merge( File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 457, in _unload_and_optionally_merge target.merge(safe_merge=safe_merge, adapter_names=adapter_names) File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 470, in merge delta_weight = self.get_delta_weight(active_adapter) File "/home/ubuntu/miniconda3/envs/h2o_llm_studio_jul_24/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 533, in get_delta_weight output_tensor = transpose(weight_B @ weight_A, self.fan_in_fan_out) * self.scaling[adapter] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
To Reproduce
cfg.yaml
Last login: Wed Jul 24 09:21:21 on ttys003
The default interactive shell is now zsh. To update your account to use zsh, please run
chsh -s /bin/zsh
. For more details, please visit https://support.apple.com/kb/HT208050. (base) Todds-MBP:heavydb_benchmarks todd$ ssh lambda_train Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 6.2.0-37-generic x86_64) .============. || || || \\ || | | | | | | || _\ || | | / ` | '_ \| '_ \ / _
|/ _` | || /λ\ || | || (| | | | | | | |) | (| | (| | || // _\ || |_____,|| || ||./ _,|\,_| .============. GPU CLOUDSupport: https://ubuntu.com/advantage
System information as of Wed Jul 24 18:59:10 UTC 2024
System load: 0.04296875 Processes: 2188 Usage of /: 16.2% of 18.93TB Users logged in: 1 Memory usage: 3% IPv4 address for docker0: 172.17.0.1 Swap usage: 0% IPv4 address for eno1: 10.19.143.128 architecture: backbone_dtype: bfloat16 gradient_checkpointing: true intermediate_dropout: 0.0 pretrained: true pretrained_weights: '' augmentation: neftune_noise_alpha: 0.0 random_parent_probability: 0.0 skip_parent_probability: 0.0 token_mask_probability: 0.0 dataset: add_bos_token_to_answer: false add_bos_token_to_prompt: false add_bos_token_to_system: false add_eos_token_to_answer: true add_eos_token_to_prompt: false add_eos_token_to_system: true answer_column: answer chatbot_author: H2O.ai chatbot_name: h2oGPT data_sample: 1.0 data_sample_choice:
LLM Studio version
a1b2923a1ad29571e4fcdf4e9fdfd505c949bd16 (tip of main)