pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.46k stars 468 forks source link

[Bug] XlaBuilder is already registered when working with HuggingFace Trainer #3940

Open comaniac opened 2 years ago

comaniac commented 2 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Install torch_xla https://github.com/pytorch/xla/commit/d8db50a778a39fab0a58436307a3225a6ca06f67.
  2. Install HuggingFace transformers https://github.com/huggingface/transformers/commit/06a6a4bd516f7d0ba7c4966a2d3d9c0bf07797ae
  3. Run the following:
>>> from transformers import TrainingArguments
Traceback (most recent call last):
  File "transformers/src/transformers/utils/import_utils.py", line 1030, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "transformers/src/transformers/training_args.py", line 60, in <module>
    import torch_xla.core.xla_model as xm
  File "lib/python3.7/site-packages/torch_xla-1.13-py3.7-linux-x86_64.egg/torch_xla/__init__.py", line 112, in <module>
    import _XLAC
ImportError: generic_type: type "XlaBuilder" is already registered!

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "transformers/src/transformers/utils/import_utils.py", line 1020, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "transformers/src/transformers/utils/import_utils.py", line 1035, in _get_module
    ) from e
RuntimeError: Failed to import transformers.training_args because of the following error (look up to see its traceback):
generic_type: type "XlaBuilder" is already registered!

Expected behavior

Error free import.

Environment

Additional context

It works if I explicitly import torch_xla before importing TrainingArguments:

>>> import torch_xla
>>> from transformers import TrainingArguments
>>>
JackCaoG commented 2 years ago

seems to be some pybind bug..

ymwangg commented 2 years ago

Maybe some environment issues? I couldn't reproduce it in the CI docker image gcr.io/tpu-pytorch/xla_base:latest-d8db50a778a39fab0a58436307a3225a6ca06f67. with pip install git@https://github.com/huggingface/transformers@06a6a4b.

comaniac commented 2 years ago

Maybe it's due to different build process? I didn't build PyTorch from source. Instead, I pip installed nightly PyTorch and manually installed corresponding libtorch. This is sufficient to build torch_xla and could save lots of time of building PyTorch from source. On the other hand, since nightly libtorch is built with g++ instead of clang++, I specified export CC=gcc CXX=g++ in my build script. However, I have no idea how this affect the error mentioned in this issue...

jeffhataws commented 1 year ago

"import _XLAC" error usually happens when TensorFlow or JAX are installed in the same environment. @comaniac can you check?