securefederatedai / openfl

An open framework for Federated Learning.
https://openfl.readthedocs.io/en/latest/index.html
Apache License 2.0
706 stars 187 forks source link

Encountered AssertionError regarding distutils when running `fx plan initialize` #1017

Open psfoley opened 1 month ago

psfoley commented 1 month ago

Discussed in https://github.com/securefederatedai/openfl/discussions/1016

Originally posted by **hwpang** July 31, 2024 Hi, I'm a new user to OpenFL and am going through the Quick Start example at https://openfl.readthedocs.io/en/latest/get_started/quickstart.html. I was able to follow the example up until `fx plan initialize`, where I encountered the following assertion error: ``` /anaconda/envs/openfl-env/lib/python3.8/site-packages/_distutils_hack/__init__.py:11: UserWarning: Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that setuptools is always imported before distutils. warnings.warn( /anaconda/envs/openfl-env/lib/python3.8/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") EXCEPTION : /anaconda/envs/openfl-env/lib/python3.8/distutils/core.py Traceback (most recent call last): File "/anaconda/envs/openfl-env/bin/fx", line 8, in sys.exit(entry()) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/openfl/interface/cli.py", line 268, in entry error_handler(e) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/openfl/interface/cli.py", line 195, in error_handler raise error File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/openfl/interface/cli.py", line 266, in entry cli(max_content_width=120) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/openfl/interface/plan.py", line 129, in initialize data_loader = plan.get_data_loader(collaborator_cname) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/openfl/federated/plan/plan.py", line 390, in get_data_loader self.loader_ = Plan.build(**defaults) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/openfl/federated/plan/plan.py", line 194, in build module = import_module(module_path) File "/anaconda/envs/openfl-env/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang7/code/Users/hpang/Projects/Federated_learning/openfl/my_workspace/src/ptmnist_inmemory.py", line 7, in from .mnist_utils import load_mnist_shard File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang7/code/Users/hpang/Projects/Federated_learning/openfl/my_workspace/src/mnist_utils.py", line 10, in from torchvision import datasets File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torchvision/__init__.py", line 10, in from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils # usort:skip File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torchvision/models/__init__.py", line 2, in from .convnext import * File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torchvision/models/convnext.py", line 8, in from ..ops.misc import Conv2dNormActivation, Permute File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torchvision/ops/__init__.py", line 23, in from .poolers import MultiScaleRoIAlign File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torchvision/ops/poolers.py", line 10, in from .roi_align import roi_align File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torchvision/ops/roi_align.py", line 7, in from torch._dynamo.utils import is_compile_supported File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torch/_dynamo/__init__.py", line 2, in from . import convert_frame, eval_frame, resume_execution File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 48, in from . import config, exc, trace_rules File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torch/_dynamo/exc.py", line 12, in from .utils import counters File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 1063, in if has_triton_package(): File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/torch/utils/_triton.py", line 9, in has_triton_package import triton File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/__init__.py", line 8, in from .runtime import ( File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/runtime/__init__.py", line 1, in from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 9, in from ..testing import do_bench, do_bench_cudagraph File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/testing.py", line 7, in from . import language as tl File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/language/__init__.py", line 4, in from . import math File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/language/math.py", line 1, in from . import core File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/language/core.py", line 10, in from ..runtime.jit import jit File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/runtime/jit.py", line 12, in from ..runtime.driver import driver File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/runtime/driver.py", line 1, in from ..backends import backends File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/backends/__init__.py", line 50, in backends = _discover_backends() File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/backends/__init__.py", line 44, in _discover_backends driver = _load_module(name, os.path.join(root, name, 'driver.py')) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/backends/__init__.py", line 12, in _load_module spec.loader.exec_module(module) File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/backends/amd/driver.py", line 7, in from triton.runtime.build import _build File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/triton/runtime/build.py", line 8, in import setuptools File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/setuptools/__init__.py", line 8, in import _distutils_hack.override # noqa: F401 File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/_distutils_hack/override.py", line 1, in __import__('_distutils_hack').do_override() File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/_distutils_hack/__init__.py", line 70, in do_override ensure_local_distutils() File "/anaconda/envs/openfl-env/lib/python3.8/site-packages/_distutils_hack/__init__.py", line 57, in ensure_local_distutils assert '_distutils' in core.__file__, core.__file__ AssertionError: /anaconda/envs/openfl-env/lib/python3.8/distutils/core.py ``` Information that may be relevant: Python version 3.8.19 OpenFL version 1.5 Operating system Ubuntu 20.04 My environment packages from `conda list`: ``` # packages in environment at /anaconda/envs/openfl-env: # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu absl-py 2.1.0 pypi_0 pypi anyio 4.4.0 pypi_0 pypi argon2-cffi 23.1.0 pypi_0 pypi argon2-cffi-bindings 21.2.0 pypi_0 pypi arrow 1.3.0 pypi_0 pypi asttokens 2.4.1 pypi_0 pypi async-lru 2.0.4 pypi_0 pypi attrs 23.2.0 pypi_0 pypi babel 2.15.0 pypi_0 pypi backcall 0.2.0 pypi_0 pypi beautifulsoup4 4.12.3 pypi_0 pypi bleach 6.1.0 pypi_0 pypi bzip2 1.0.8 h5eee18b_6 ca-certificates 2024.7.2 h06a4308_0 cachetools 5.4.0 pypi_0 pypi certifi 2024.7.4 pypi_0 pypi cffi 1.16.0 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi click 8.1.7 pypi_0 pypi cloudpickle 3.0.0 pypi_0 pypi comm 0.2.2 pypi_0 pypi cryptography 43.0.0 pypi_0 pypi debugpy 1.8.2 pypi_0 pypi decorator 5.1.1 pypi_0 pypi defusedxml 0.7.1 pypi_0 pypi docker 7.1.0 pypi_0 pypi dynaconf 3.2.5 pypi_0 pypi exceptiongroup 1.2.2 pypi_0 pypi executing 2.0.1 pypi_0 pypi fastjsonschema 2.20.0 pypi_0 pypi filelock 3.15.4 pypi_0 pypi flatten-json 0.1.14 pypi_0 pypi fqdn 1.5.1 pypi_0 pypi fsspec 2024.6.1 pypi_0 pypi google-auth 2.32.0 pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi grpcio 1.65.2 pypi_0 pypi h11 0.14.0 pypi_0 pypi httpcore 1.0.5 pypi_0 pypi httpx 0.27.0 pypi_0 pypi idna 3.7 pypi_0 pypi importlib-metadata 8.2.0 pypi_0 pypi importlib-resources 6.4.0 pypi_0 pypi ipykernel 6.29.5 pypi_0 pypi ipython 8.12.3 pypi_0 pypi isoduration 20.11.0 pypi_0 pypi jedi 0.19.1 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi joblib 1.4.2 pypi_0 pypi json5 0.9.25 pypi_0 pypi jsonpointer 3.0.0 pypi_0 pypi jsonschema 4.23.0 pypi_0 pypi jsonschema-specifications 2023.12.1 pypi_0 pypi jupyter-client 8.6.2 pypi_0 pypi jupyter-core 5.7.2 pypi_0 pypi jupyter-events 0.10.0 pypi_0 pypi jupyter-lsp 2.2.5 pypi_0 pypi jupyter-server 2.14.2 pypi_0 pypi jupyter-server-terminals 0.5.3 pypi_0 pypi jupyterlab 4.2.4 pypi_0 pypi jupyterlab-pygments 0.3.0 pypi_0 pypi jupyterlab-server 2.27.3 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.4.4 h6a678d5_1 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 libuuid 1.41.5 h5eee18b_0 markdown 3.6 pypi_0 pypi markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi matplotlib-inline 0.1.7 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mistune 3.0.2 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi nbclient 0.10.0 pypi_0 pypi nbconvert 7.16.4 pypi_0 pypi nbformat 5.10.4 pypi_0 pypi ncurses 6.4 h6a678d5_0 nest-asyncio 1.6.0 pypi_0 pypi networkx 3.1 pypi_0 pypi notebook-shim 0.2.4 pypi_0 pypi numpy 1.24.4 pypi_0 pypi nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi nvidia-curand-cu12 10.3.2.106 pypi_0 pypi nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi nvidia-nccl-cu12 2.20.5 pypi_0 pypi nvidia-nvjitlink-cu12 12.5.82 pypi_0 pypi nvidia-nvtx-cu12 12.1.105 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi openfl 1.5 pypi_0 pypi openssl 3.0.14 h5eee18b_0 overrides 7.7.0 pypi_0 pypi packaging 24.1 pypi_0 pypi pandas 2.0.3 pypi_0 pypi pandocfilters 1.5.1 pypi_0 pypi parso 0.8.4 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 10.4.0 pypi_0 pypi pip 24.0 py38h06a4308_0 pkgutil-resolve-name 1.3.10 pypi_0 pypi platformdirs 4.2.2 pypi_0 pypi prometheus-client 0.20.0 pypi_0 pypi prompt-toolkit 3.0.47 pypi_0 pypi protobuf 3.20.3 pypi_0 pypi psutil 6.0.0 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.3 pypi_0 pypi pyasn1 0.6.0 pypi_0 pypi pyasn1-modules 0.4.0 pypi_0 pypi pycparser 2.22 pypi_0 pypi pygments 2.18.0 pypi_0 pypi python 3.8.19 h955ad1f_0 python-dateutil 2.9.0.post0 pypi_0 pypi python-json-logger 2.0.7 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi pyzmq 26.0.3 pypi_0 pypi readline 8.2 h5eee18b_0 referencing 0.35.1 pypi_0 pypi requests 2.32.3 pypi_0 pypi requests-oauthlib 2.0.0 pypi_0 pypi rfc3339-validator 0.1.4 pypi_0 pypi rfc3986-validator 0.1.1 pypi_0 pypi rich 13.7.1 pypi_0 pypi rpds-py 0.19.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-learn 1.3.2 pypi_0 pypi scipy 1.10.1 pypi_0 pypi send2trash 1.8.3 pypi_0 pypi setuptools 69.5.1 py38h06a4308_0 six 1.16.0 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi soupsieve 2.5 pypi_0 pypi sqlite 3.45.3 h5eee18b_0 stack-data 0.6.3 pypi_0 pypi sympy 1.13.1 pypi_0 pypi tensorboard 2.14.0 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi tensorboardx 2.6 pypi_0 pypi terminado 0.18.1 pypi_0 pypi threadpoolctl 3.5.0 pypi_0 pypi tinycss2 1.3.0 pypi_0 pypi tk 8.6.14 h39e8969_0 tomli 2.0.1 pypi_0 pypi torch 2.4.0 pypi_0 pypi torchvision 0.19.0 pypi_0 pypi tornado 6.4.1 pypi_0 pypi tqdm 4.66.4 pypi_0 pypi traitlets 5.14.3 pypi_0 pypi triton 3.0.0 pypi_0 pypi types-python-dateutil 2.9.0.20240316 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi tzdata 2024.1 pypi_0 pypi uri-template 1.3.0 pypi_0 pypi urllib3 2.2.2 pypi_0 pypi wcwidth 0.2.13 pypi_0 pypi webcolors 24.6.0 pypi_0 pypi webencodings 0.5.1 pypi_0 pypi websocket-client 1.8.0 pypi_0 pypi werkzeug 3.0.3 pypi_0 pypi wheel 0.43.0 py38h06a4308_0 xz 5.4.6 h5eee18b_1 zipp 3.19.2 pypi_0 pypi zlib 1.2.13 h5eee18b_1 ```
kta-intel commented 1 month ago

@hwpang, thanks for raising this issue!

It is a strange one. To my understanding, it seems to be caused when triton imports setuptools. More specifically, it is because setuptools is being imported after distutils (I'd have to follow the stack trace deeper to see where this is occuring, though). setuptools attempts to override distutils with a local copy. If distutils is imported beforehand, though, this can result in a conflict like the one we are seeing.

We'll do a deeper investigation to see to what extent we can resolve this on our end.

For now, there are two quick fixes you can do:

Option 1. Down grade to torch==2.3.1 and torchvision==0.18.1 this will automatically install an earlier version of triton that doesn't seem to have this issue. The workspace should automatically do this, so if you uninstall torch and torchvision, you should be able to just rerun: fx workspace create --template torch_cnn_mnist --prefix my_workspace

-or-

Option 2. Trying setting export SETUPTOOLS_USE_DISTUTILS=stdlib if you need the latest torch/triton. This will disable using the local distutils. This worked on my end. Hopefully on yours too