Closed HuXiLiFeng closed 1 month ago
accelerate采用fsdp加速训练时,报错如下:
Exception: Could not find the transformer layer class to wrap in the model.
accelerate配置如下: compute_environment: LOCAL_MACHINE distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_backward_prefetch_policy: NO_PREFETCH fsdp_offload_params: false fsdp_sharding_strategy: 3 fsdp_state_dict_type: FULL_STATE_DICT fsdp_transformer_layer_cls_to_wrap: BertLayer machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
麻烦帮忙看看整个问题
fsdp_transformer_layer_cls_to_wrap: XLMRobertaLayer
accelerate采用fsdp加速训练时,报错如下:
accelerate配置如下: compute_environment: LOCAL_MACHINE distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_backward_prefetch_policy: NO_PREFETCH fsdp_offload_params: false fsdp_sharding_strategy: 3 fsdp_state_dict_type: FULL_STATE_DICT fsdp_transformer_layer_cls_to_wrap: BertLayer machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
麻烦帮忙看看整个问题