intel / torch-ccl

oneCCL Bindings for Pytorch*
BSD 3-Clause "New" or "Revised" License
86 stars 25 forks source link

Is xpu supported in recent versions? or which version should be use? #41

Closed KepingYan closed 1 year ago

KepingYan commented 1 year ago

I built oneccl_bind_pt from Source and set 'COMPUTE_BACKEND' to 'dpcpp' when built it. I found intel_extension_for_pytorch only v1.10.200 was released with XPU support, so I installed it with torch 1.10.0a0(refer to intel-extension-for-pytorch). Then I tried branch ccl_torch1.10, and modified 'torch_ipex' to 'intel_extension_for_pytorch' in setup.py( setup.py#L119 ).Then I got an error as below:

make: *** No rule to make target 'torch_ccl'.  Stop.
Traceback (most recent call last):
  File "…/torch-ccl/setup.py", line 214, in <module>
    setup(
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/command/install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/lzhi/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "…/torch-ccl/setup.py", line 78, in run
    self.build_cmake(ext)
  File "…/torch-ccl/setup.py", line 132, in build_cmake
    check_call(['make', 'torch_ccl'] + build_args, cwd=str(build_dir))
  File "…/miniconda3/envs/pytorch-1.10/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['make', 'torch_ccl', '-j', '128']' returned non-zero exit status 2.

If I bypass it, another error will appear: make: *** No rule to make target 'torch_ccl_xpu'. Stop. when run check_call(['make', 'torch_ccl_xpu'] + build_args, cwd=str(build_dir)). I think maybe it's because there's no Makefile in cwd directory 'build/temp.linux-x86_64-cpython-39.libtorch_ccl'. How should I solve this problem?

chengjunlu commented 1 year ago

Hi @KepingYan, Thanks for trying out the torch-ccl for XPU supporting.

Right now the torch-ccl in public repo doesn't support XPU yet.

We are going to release the IPEX and torch-ccl for XPU in next week.

Please try again then.

chengjunlu commented 1 year ago

Hi @KepingYan ,

The changes to support XPU has just been merged into master branch. Please check out.

Thanks John

KepingYan commented 1 year ago

@chengjunlu OK. Thanks!