microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.19k stars 4.07k forks source link

[BUG] deepspeed.utils.logging: module 'torch.compiler' has no attribute 'is_compiling' #6656

Open Thinksky5124 opened 8 hours ago

Thinksky5124 commented 8 hours ago

When I import deepseed, it just failed.

Here are error trace back:

[2024-10-23T03:04:17.778Z]     import deepspeed
[2024-10-23T03:04:17.778Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/__init__.py:25: in <module>
[2024-10-23T03:04:17.779Z]     from . import ops
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/ops/__init__.py:6: in <module>
[2024-10-23T03:04:17.779Z]     from . import adam
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/ops/adam/__init__.py:6: in <module>
[2024-10-23T03:04:17.779Z]     from .cpu_adam import DeepSpeedCPUAdam
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py:8: in <module>
[2024-10-23T03:04:17.779Z]     from deepspeed.utils import logger
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/utils/__init__.py:10: in <module>
[2024-10-23T03:04:17.779Z]     from .groups import *
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/utils/groups.py:28: in <module>
[2024-10-23T03:04:17.779Z]     from deepspeed import comm as dist
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/comm/__init__.py:7: in <module>
[2024-10-23T03:04:17.779Z]     from .comm import *
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/comm/comm.py:31: in <module>
[2024-10-23T03:04:17.779Z]     from deepspeed.comm.ccl import CCLBackend
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/comm/ccl.py:11: in <module>
[2024-10-23T03:04:17.779Z]     from deepspeed.ops.op_builder import NotImplementedBuilder
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/__init__.py:53: in <module>
[2024-10-23T03:04:17.779Z]     this_module.__dict__[member_name] = builder_closure(member_name)
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/__init__.py:41: in builder_closure
[2024-10-23T03:04:17.779Z]     builder = get_accelerator().get_op_builder(member_name)
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/accelerator/real_accelerator.py:219: in get_accelerator
[2024-10-23T03:04:17.779Z]     accel_logger.info(f"Setting ds_accelerator to {ds_accelerator._name} ({ds_set_method})")
[2024-10-23T03:04:17.779Z] /usr/local/python3.10.12/lib/python3.10/logging/__init__.py:1477: in info
[2024-10-23T03:04:17.779Z]     self._log(INFO, msg, args, **kwargs)
[2024-10-23T03:04:17.779Z] /usr/local/python3.10.12/lib/python3.10/logging/__init__.py:1624: in _log
[2024-10-23T03:04:17.779Z]     self.handle(record)
[2024-10-23T03:04:17.779Z] /usr/local/python3.10.12/lib/python3.10/logging/__init__.py:1633: in handle
[2024-10-23T03:04:17.779Z]     if (not self.disabled) and self.filter(record):
[2024-10-23T03:04:17.779Z] /usr/local/python3.10.12/lib/python3.10/logging/__init__.py:823: in filter
[2024-10-23T03:04:17.779Z]     result = f(record) # assume callable - will raise if not
[2024-10-23T03:04:17.779Z] /home/users/xxx/.local/lib/python3.10/site-packages/deepspeed/utils/logging.py:29: in warn_once
[2024-10-23T03:04:17.779Z]     if is_compile_supported() and torch.compiler.is_compiling() and not warn:
[2024-10-23T03:04:17.779Z] E   AttributeError: module 'torch.compiler' has no attribute 'is_compiling'%

When I install deepspeed==0.15.3, it will repoert error. If deepspeed version <= 0.15.2, it's ok.

I think it is the problem of torch version. But deepspeed should support previous torch version, such as 2.1.0 or 1.13.0 etc.

Environment Info:

timm                          1.0.11
torch                         2.1.0+cu118
torchaudio                    2.1.0+cu118
torchmetrics                  0.5.0
torchvision                   0.16.0+cu118
triton                        2.1.0
deepspeed                     0.15.3
Python                        3.10.12
MzeroMiko commented 6 hours ago

I also faced this issue with torch==2.2.0 + nvidia-cuda-runtime-cu12==12.1.105. While deepspeed==0.15.2 is totally fine.