是不是不支持在mac M系列芯片上进行预训练

ban-shi-yi-sheng commented 6 months ago

提交前必须检查以下项目

[X] 请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案。
[X] 第三方插件问题：例如llama.cpp、LangChain、text-generation-webui等，同时建议到对应的项目中查找解决方案。

问题类型

模型训练与精调

基础模型

Chinese-Alpaca-2-16K (7B/13B)

操作系统

macOS

详细描述问题

是不是不支持在mac M系列芯片上进行预训练

依赖情况（代码类问题务必提供）

# 请在此处粘贴依赖情况（请粘贴在本代码块里）

运行日志或截图

./run_pt.sh                                                                                                                             ─╯
NOTE: Redirects are currently not supported in Windows or MacOs.
/opt/homebrew/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Traceback (most recent call last):
  File "/Users/xxxx/AI/Chinese-LLaMA-Alpaca-2/scripts/training/run_clm_pt_with_peft.py", line 720, in <module>
    main()
  File "/Users/xxxx/AI/Chinese-LLaMA-Alpaca-2/scripts/training/run_clm_pt_with_peft.py", line 375, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
    obj = dtype(**inputs)
          ^^^^^^^^^^^^^^^
  File "<string>", line 129, in __init__
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/training_args.py", line 1442, in __post_init__
    and (self.device.type != "cuda")
         ^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/training_args.py", line 1887, in device
    return self._setup_devices
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/utils/generic.py", line 54, in __get__
    cached = self.fget(obj)
             ^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/training_args.py", line 1813, in _setup_devices
    self.distributed_state = PartialState(timeout=timedelta(seconds=self.ddp_timeout))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/accelerate/state.py", line 167, in __init__
    assert (
AssertionError: DeepSpeed is not available => install it using `pip3 install deepspeed` or build it from source
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25443) of binary: /opt/homebrew/opt/python@3.11/bin/python3.11
Traceback (most recent call last):
  File "/opt/homebrew/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
run_clm_pt_with_peft.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-01-04_20:02:16
  host      : wanrendembp.lan
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 25443)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

ymcui commented 6 months ago

自行研究一下吧。

llama.cpp的finetune方法: https://github.com/ggerganov/llama.cpp/tree/master/examples/finetune
mlx: https://github.com/ml-explore/mlx

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] commented 5 months ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

ban-shi-yi-sheng commented 5 months ago

自行研究一下吧。

llama.cpp的finetune方法：https://github.com/ggerganov/llama.cpp/tree/master/examples/finetune

mlx： https: //github.com/ml-explore/mlx

感谢感谢 *

ymcui / Chinese-LLaMA-Alpaca-2