pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.43k stars 446 forks source link

Try running inference on an ARM CPU #7185

Open duncantech opened 2 months ago

duncantech commented 2 months ago

📚 Documentation

Install the CPU PJRT plugin from the instructions here: https://github.com/pytorch/xla/blob/master/plugins/cpu/README.md

Next try getting a model to run on a ARM CPU, if it works, create a tutorial on how to get it running.

tejadhith commented 2 months ago

/assigntome

tejadhith commented 2 months ago

Steps and Issues encountered while installing CPU PJRT Plugin,

01: Install torch_xla [Success]

pip install torch_xla

02: Build or Install cpu Plugin [Failed]

# Build wheel
pip wheel plugins/cpu -v
# Or install directly
pip install plugins/cpu -v

Similar issue was encountered as mentioned in https://github.com/pytorch/xla/issues/7184#issuecomment-2148759661

03: Install bazel [Success]

brew install bazel

04: Resolve bazel version mismatch [Success]

 ERROR: The project you're trying to build requires Bazel 6.5.0 (specified in /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/.bazelversion), but it wasn't found in /opt/homebrew/Cellar/bazel/7.1.2/libexec/bin.
cd "/opt/homebrew/Cellar/bazel/7.1.2/libexec/bin" && curl -fLO https://releases.bazel.build/6.5.0/release/bazel-6.5.0-darwin-arm64 && chmod +x bazel-6.5.0-darwin-arm64

05: C++ standard version mismatch [Success]

Following was added to .bazelrc

build --cxxopt=-std=gnu++17 
build --host_cxxopt=-std=gnu++17

06: [Failed]

$ pip install plugins/cpu -v
Using pip 23.3.1 from /Users/tej/anaconda3/envs/PyTorch/lib/python3.11/site-packages/pip (python 3.11)
Processing ./plugins/cpu
  Running command pip subprocess to install build dependencies
  Collecting setuptools
    Using cached setuptools-70.0.0-py3-none-any.whl.metadata (5.9 kB)
  Using cached setuptools-70.0.0-py3-none-any.whl (863 kB)
  Installing collected packages: setuptools
  Successfully installed setuptools-70.0.0
  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  bazel build //plugins/cpu:pjrt_c_api_cpu_plugin.so --symlink_prefix=/Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/plugins/cpu/bazel- --remote_default_exec_properties=cache-silo-key=dev
  INFO: Options provided by the client:
    Inherited 'common' options: --isatty=0 --terminal_columns=80
  INFO: Reading rc options for 'build' from /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/.bazelrc:
    Inherited 'common' options: --experimental_repo_remote_exec
  INFO: Reading rc options for 'build' from /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/.bazelrc:
    'build' options: --announce_rc --nocheck_visibility --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_hdfs_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --action_env=CC=gcc --action_env=CXX=g++ --spawn_strategy=standalone --incompatible_strict_action_env --noremote_upload_local_results --java_runtime_version=remotejdk_11 --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 --define framework_shared_object=false --define tsl_protobuf_header_only=false --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --define=with_xla_support=true --noincompatible_remove_legacy_whole_archive --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility --cxxopt=-std=gnu++17 --host_cxxopt=-std=gnu++17
  INFO: Found applicable config definition build:short_logs in file /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
  Loading:
  Loading:
  Loading: 0 packages loaded
  INFO: Build options --cxxopt and --host_cxxopt have changed, discarding analysis cache.
  Analyzing: target //plugins/cpu:pjrt_c_api_cpu_plugin.so (0 packages loaded, 0 targets configured)
  INFO: Analyzed target //plugins/cpu:pjrt_c_api_cpu_plugin.so (1 packages loaded, 10840 targets configured).
   checking cached actions
  INFO: Found 1 target...
  [1 / 5] [Prepa] BazelWorkspaceStatusAction stable-status.txt
  [249 / 1,676] Compiling llvm/lib/Demangle/RustDemangle.cpp [for tool]; 1s local ... (7 actions, 6 running)
  [381 / 1,889] Compiling src/google/protobuf/compiler/zip_writer.cc [for tool]; 1s local ... (7 actions, 6 running)
  [1,621 / 3,679] Compiling src/google/protobuf/compiler/code_generator.cc [for tool]; 2s local ... (7 actions, 6 running)
  [2,685 / 5,997] Compiling src/google/protobuf/compiler/python/helpers.cc [for tool]; 2s local ... (6 actions running)
  [2,952 / 6,589] Compiling xla/ef57.cc; 2s local ... (7 actions running)
  [2,956 / 6,589] Compiling src/google/protobuf/compiler/python/pyi_generator.cc [for tool]; 3s local ... (5 actions running)
  [6,588 / 6,589] Linking plugins/cpu/pjrt_c_api_cpu_plugin.so; 0s local
  ERROR: /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/plugins/cpu/BUILD:17:14: Linking plugins/cpu/pjrt_c_api_cpu_plugin.so failed: (Exit 1): cc_wrapper.sh failed: error executing command (from target //plugins/cpu:pjrt_c_api_cpu_plugin.so) external/local_config_cc/cc_wrapper.sh @bazel-out/darwin_arm64-opt/bin/plugins/cpu/pjrt_c_api_cpu_plugin.so-2.params
  ld: unknown options: --version-script --no-undefined
  clang: error: linker command failed with exit code 1 (use -v to see invocation)
  Target //plugins/cpu:pjrt_c_api_cpu_plugin.so failed to build
  Use --verbose_failures to see the command lines of failed build steps.
  INFO: Elapsed time: 9.527s, Critical Path: 4.55s
  INFO: 27 processes: 2 internal, 25 local.
  FAILED: Build did NOT complete successfully
  Traceback (most recent call last):
    File "/Users/tej/anaconda3/envs/PyTorch/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/Users/tej/anaconda3/envs/PyTorch/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/Users/tej/anaconda3/envs/PyTorch/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
             ^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/wc/rkrv7ck92zd4f_3qgk8q2gn00000gn/T/pip-build-env-3vcjdfr7/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/wc/rkrv7ck92zd4f_3qgk8q2gn00000gn/T/pip-build-env-3vcjdfr7/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
      self.run_setup()
    File "/private/var/folders/wc/rkrv7ck92zd4f_3qgk8q2gn00000gn/T/pip-build-env-3vcjdfr7/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 10, in <module>
    File "/Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/plugins/cpu/../../build_util.py", line 67, in bazel_build
      subprocess.check_call(bazel_argv, stdout=sys.stdout, stderr=sys.stderr)
    File "/Users/tej/anaconda3/envs/PyTorch/lib/python3.11/subprocess.py", line 413, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['bazel', 'build', '//plugins/cpu:pjrt_c_api_cpu_plugin.so', '--symlink_prefix=/Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/plugins/cpu/bazel-', '--remote_default_exec_properties=cache-silo-key=dev']' returned non-zero exit status 1.
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /Users/tej/anaconda3/envs/PyTorch/bin/python /Users/tej/anaconda3/envs/PyTorch/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py get_requires_for_build_wheel /var/folders/wc/rkrv7ck92zd4f_3qgk8q2gn00000gn/T/tmpn1xqffzo
  cwd: /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/plugins/cpu
  Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Machine Specs


$ python -V
Python 3.11.9
$ pip list
Package                   Version
------------------------- --------------
accelerate                0.30.0.dev0
aiohttp                   3.9.5
aiosignal                 1.3.1
anyio                     4.3.0
appnope                   0.1.4
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
attrs                     23.2.0
audioread                 3.0.1
Babel                     2.14.0
beautifulsoup4            4.12.3
bitsandbytes              0.42.0
bleach                    6.1.0
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
comm                      0.2.2
contourpy                 1.2.1
cycler                    0.12.1
datasets                  2.19.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
executing                 2.0.1
fastjsonschema            2.19.1
filelock                  3.13.4
fonttools                 4.51.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.3.1
h11                       0.14.0
httpcore                  1.0.5
httpx                     0.27.0
huggingface-hub           0.22.2
idna                      3.7
ipykernel                 6.29.4
ipython                   8.24.0
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
joblib                    1.4.2
json5                     0.9.25
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.0
jupyter_server_terminals  0.5.3
jupyterlab                4.1.8
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.1
kiwisolver                1.4.5
lazy_loader               0.4
librosa                   0.10.2
llvmlite                  0.42.0
MarkupSafe                2.1.5
matplotlib                3.8.4
matplotlib-inline         0.1.7
mistune                   3.0.2
mpmath                    1.3.0
msgpack                   1.0.8
multidict                 6.0.5
multiprocess              0.70.16
nbclient                  0.10.0
nbconvert                 7.16.3
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.3
notebook                  7.1.3
notebook_shim             0.2.4
numba                     0.59.1
numpy                     1.26.4
overrides                 7.7.0
packaging                 24.0
pandas                    2.2.2
pandocfilters             1.5.1
parso                     0.8.4
pexpect                   4.9.0
pillow                    10.3.0
pip                       23.3.1
platformdirs              4.2.1
pooch                     1.8.1
prometheus_client         0.20.0
prompt-toolkit            3.0.43
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   16.0.0
pyarrow-hotfix            0.6
pycparser                 2.22
Pygments                  2.17.2
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
pytube                    15.0.0
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     26.0.2
referencing               0.35.0
regex                     2024.5.10
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.18.0
safetensors               0.4.3
scikit-learn              1.4.2
scipy                     1.13.0
seaborn                   0.13.2
Send2Trash                1.8.3
sentencepiece             0.2.0
setuptools                68.2.2
six                       1.16.0
sniffio                   1.3.1
soundfile                 0.12.1
soupsieve                 2.5
soxr                      0.3.7
stack-data                0.6.3
sympy                     1.12
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.3.0
tokenizers                0.19.1
torch                     2.3.0
torch-xla                 1.0
torchaudio                2.3.0
torchvision               0.18.0
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.3
transformers              4.40.2
types-python-dateutil     2.9.0.20240316
typing_extensions         4.11.0
tzdata                    2024.1
uri-template              1.3.0
urllib3                   2.2.1
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.8.0
wheel                     0.41.2
xgboost                   2.0.3
xxhash                    3.4.1
yarl                      1.9.4
$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:

    System Software Overview:

      System Version: macOS 14.4.1 (23E224)
      Kernel Version: Darwin 23.4.0
      Boot Volume: Macintosh HD
      ...

Hardware:

    Hardware Overview:

      Model Name: MacBook Air
      Chip: Apple M1
      Total Number of Cores: 8 (4 performance and 4 efficiency)
      Memory: 8 GB
      ...
tejadhith commented 2 months ago

@duncantech May I know if this is what's expected? Or is there something wrong with what I'm doing?

JackCaoG commented 2 months ago

real error seems to be

  ERROR: /Users/tej/Documents/GitHub-Repositories/MachineLearning/Docathon-2024/xla/plugins/cpu/BUILD:17:14: Linking plugins/cpu/pjrt_c_api_cpu_plugin.so failed: (Exit 1): cc_wrapper.sh failed: error executing command (from target //plugins/cpu:pjrt_c_api_cpu_plugin.so) external/local_config_cc/cc_wrapper.sh @bazel-out/darwin_arm64-opt/bin/plugins/cpu/pjrt_c_api_cpu_plugin.so-2.params
  ld: unknown options: --version-script --no-undefined
  clang: error: linker command failed with exit code 1 (use -v to see invocation)

I asked bard and it told me

" Platform incompatibility: These options might be specific to certain platforms or linkers. For example, --no-undefined is generally used with the GNU linker, and it may not be supported on other linkers like the one Apple uses for macOS. Similarly, --version-script is used to control symbol versions and might not be available on all platforms. " I am guessing ARM CPU build does not work out of the box and require us tweaking the build config.