Running locally? - Githubissues

pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

https://pytorch.org/xla

Other

2.47k stars 469 forks source link

Running locally? #2642

Closed tyoc213 closed 3 years ago

tyoc213 commented 3 years ago

❓ Questions and Help

@zcain117 from https://github.com/pytorch/xla/issues/2272#issuecomment-692289094 do I need to create an env var on my PC? or not possible at all and I need to run it on a GCP? I have Driver Version: 450.80.02 CUDA Version: 11.0 card.

$ docker run --gpus all -it --shm-size 16G gcr.io/tpu-pytorch/xla@sha256:efe47b7a3875ddfa3ea9c68a12ed5517c19cbb3cf272776fba64bec8a683299f
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled

zcain117 commented 3 years ago

I don't think any of us have tried on a PC. The error you're seeing is from docker, not from pytorch/xla. Are you able to use your GPU for other deep learning training on this PC?

tyoc213 commented 3 years ago

yeah but directly on conda, but haven't used a docker image for cuda (because little noob on docker, thought I can follow trough provided commands), will try to test on a docker-cuda thing and see if I can run torch there and get back with that info.

tyoc213 commented 3 years ago

Oh, was missing nvidia docker...

(pytorch) root@b6cd5fe3508e:/# python pytorch/xla/test/test_train_mnist.py 
E
======================================================================
ERROR: test_accurracy (__main__.TrainMnist)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytorch/xla/test/test_train_mnist.py", line 186, in test_accurracy
    self.assertGreaterEqual(train_mnist(), FLAGS.target_accuracy)
  File "pytorch/xla/test/test_train_mnist.py", line 112, in train_mnist
    max_devices=FLAGS.num_cores) if FLAGS.num_cores != 0 else [])
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/core/xla_model.py", line 136, in get_xla_supported_devices
    xla_devices = _DEVICES.value
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/utils/utils.py", line 32, in value
    self._value = self._gen_fn()
  File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/core/xla_model.py", line 18, in <lambda>
    _DEVICES = xu.LazyProperty(lambda: torch_xla._XLAC._xla_get_devices())
RuntimeError: tensorflow/compiler/xla/xla_client/computation_client.cc:274 : Missing XLA configuration

Then

(pytorch) root@2f6b27f59262:/# export XRT_TPU_CONFIG="tpu_worker;0;127.0.0.1:8470"
(pytorch) root@2f6b27f59262:/# python pytorch/xla/test/test_train_mnist.py
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw/train-images-idx3-ubyte.gz
9920512it [00:04, 2427319.82it/s]                                                                                                                                                                                    
Extracting /tmp/mnist-data/MNIST/raw/train-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw/train-labels-idx1-ubyte.gz
32768it [00:00, 103196.34it/s]                                                                                                                                                                                       
Extracting /tmp/mnist-data/MNIST/raw/train-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw/t10k-images-idx3-ubyte.gz
1654784it [00:01, 1379910.54it/s]                                                                                                                                                                                    
Extracting /tmp/mnist-data/MNIST/raw/t10k-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw/t10k-labels-idx1-ubyte.gz
8192it [00:00, 34214.57it/s]                                                                                                                                                                                         
Extracting /tmp/mnist-data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw
Processing...
/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torchvision/datasets/mnist.py:480: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Done!

So it is working?.... mmm it seems that after that it doesn't return to the interactive promt and I need to docker kill blissful_galileo or crazy_neumann.

mmmmm, so now that is working? and seeing the code is there... how can I test a "change" to something? because tried to compile and at less pytorch which says no CUDA??

(pytorch) root@d0c34dcade02:/pytorch# python setup.py install
Building wheel torch-1.7.0a0+b5d75dd
-- Building version 1.7.0a0+b5d75dd
cmake --build . --target install --config Release -- -j 4
[  0%] Built target clog
[  0%] Built target pthreadpool
[  0%] Built target libprotobuf-lite
.....
.....
[ 96%] Built target test_cpp_rpc
[100%] Built target torch_python
Install the project...
-- Install configuration: "Release"
running install
running build
running build_py
copying torch/version.py -> build/lib.linux-x86_64-3.6/torch
copying caffe2/proto/torch_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
copying caffe2/proto/caffe2_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
copying caffe2/proto/prof_dag_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
copying caffe2/proto/predictor_consts_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
copying caffe2/proto/hsm_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
copying caffe2/proto/metanet_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
copying caffe2/proto/caffe2_legacy_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto
running build_ext
-- Building with NumPy bindings
-- Not using cuDNN
-- Not using CUDA
-- Using MKLDNN
-- Not using CBLAS in MKLDNN
-- Not using NCCL
-- Building with distributed package 

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state.cpython-36m-x86_64-linux-gnu.so to /pytorch/build/lib.linux-x86_64-3.6/caffe2/python/caffe2_pybind11_state.cpython-36m-x86_64-linux-gnu.so
running install_lib
copying build/lib.linux-x86_64-3.6/caffe2/proto/torch_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/caffe2/proto/caffe2_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/caffe2/proto/prof_dag_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/caffe2/proto/predictor_consts_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/caffe2/proto/hsm_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/caffe2/proto/metanet_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/caffe2/proto/caffe2_legacy_pb2.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto
copying build/lib.linux-x86_64-3.6/torch/version.py -> /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/torch_pb2.py to torch_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/caffe2_pb2.py to caffe2_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/prof_dag_pb2.py to prof_dag_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/predictor_consts_pb2.py to predictor_consts_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/hsm_pb2.py to hsm_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/metanet_pb2.py to metanet_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/caffe2/proto/caffe2_legacy_pb2.py to caffe2_legacy_pb2.cpython-36.pyc
byte-compiling /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/version.py to version.cpython-36.pyc
running install_egg_info
running egg_info
writing torch.egg-info/PKG-INFO
writing dependency_links to torch.egg-info/dependency_links.txt
writing entry points to torch.egg-info/entry_points.txt
writing requirements to torch.egg-info/requires.txt
writing top-level names to torch.egg-info/top_level.txt
reading manifest file 'torch.egg-info/SOURCES.txt'
writing manifest file 'torch.egg-info/SOURCES.txt'
Copying torch.egg-info to /root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch-1.7.0a0+b5d75dd-py3.6.egg-info
running install_scripts
Installing convert-caffe2-to-onnx script to /root/anaconda3/envs/pytorch/bin
Installing convert-onnx-to-caffe2 script to /root/anaconda3/envs/pytorch/bin

(pytorch) root@d0c34dcade02:/pytorch# cd xla/
(pytorch) root@d0c34dcade02:/pytorch/xla# python setup.py install
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Building torch_xla version: 1.6
XLA Commit ID: 8d960a52c82b7e8490800c577ea97c203bc702e6
PyTorch Commit ID: b5d75dddd93089458c7aee91134ff281d5c3b580
Extracted 1623 functions (0 errors) from /pytorch/xla/scripts/../../torch/csrc/autograd/generated/RegistrationDeclarations.h
358 function overrides in /pytorch/xla/scripts/../torch_xla/csrc/aten_xla_type.h
Generated 1623 wrappers for /pytorch/xla/scripts/../../torch/csrc/autograd/generated/RegistrationDeclarations.h
+ OPTS=()
+ getopts O: OPTION
+ case $OPTION in
+ for i in ${OPTARG}
+ OPTS+=("--cxxopt=${i}")
+ getopts O: OPTION
+ shift 2
+ CMD=install
++ dirname /pytorch/xla/build_torch_xla_libs.sh
+ cd /pytorch/xla
+++ pwd
++ printf '%q\n' /pytorch/xla
+ PWD=/pytorch/xla
+ BASE_DIR=/pytorch/xla
+ echo /pytorch/xla
/pytorch/xla
+ THIRD_PARTY_DIR=/pytorch/xla/third_party
+ MODE=opt
+ [[ '' == \1 ]]
+ VERBOSE=
+ [[ '' == \1 ]]
+ MAX_JOBS=
+ [[ 1 == \1 ]]
+ [[ true == \t\r\u\e ]]
+ MAX_JOBS=--jobs=16
+ OPTS+=(--cxxopt="-std=c++14")
++ basename -- clang-8
+ [[ clang-8 =~ ^clang ]]
+ OPTS+=(--cxxopt="-Wno-c++11-narrowing")
+ [[ 1 == \1 ]]
+ OPTS+=(--cxxopt="-DXLA_CUDA=1")
+ OPTS+=(--config=cuda)
+ '[' install == clean ']'
+ cp -r -u -p /pytorch/xla/third_party/xla_client /pytorch/xla/third_party/tensorflow/tensorflow/compiler/xla/
+ pushd /pytorch/xla/third_party/tensorflow
/pytorch/xla/third_party/tensorflow /pytorch/xla
+ bazel build --jobs=16 --define framework_shared_object=false -c opt --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --cxxopt=-std=c++14 --cxxopt=-Wno-c++11-narrowing --cxxopt=-DXLA_CUDA=1 --config=cuda //tensorflow/compiler/xla/xla_client:libxla_computation_client.so
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=213
INFO: Reading rc options for 'build' from /pytorch/xla/third_party/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /pytorch/xla/third_party/tensorflow/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=short_logs --config=v2
INFO: Found applicable config definition build:short_logs in file /pytorch/xla/third_party/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /pytorch/xla/third_party/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /pytorch/xla/third_party/tensorflow/.bazelrc: --config=using_cuda --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:using_cuda in file /pytorch/xla/third_party/tensorflow/.bazelrc: --define=using_cuda=true --action_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --define=tensorflow_enable_mlir_generated_gpu_kernels=1
INFO: Found applicable config definition build:linux in file /pytorch/xla/third_party/tensorflow/.bazelrc: --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /pytorch/xla/third_party/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Analyzed target //tensorflow/compiler/xla/xla_client:libxla_computation_client.so (215 packages loaded, 22587 targets configured).
INFO: Found 1 target...
INFO: Deleting stale sandbox base /root/.cache/bazel/_bazel_root/df10a46af757c9817bc8ced472071f03/sandbox
[5,154 / 15,277] 16 actions, 4 running
    Compiling external/llvm-project/llvm/lib/Support/Path.cpp [for host]; 2s local
    Compiling external/aws/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp [for host]; 0s local
    Compiling external/aws/aws-cpp-sdk-s3/source/model/VersioningConfiguration.cpp [for host]; 0s local
    Compiling external/aws/aws-cpp-sdk-s3/source/model/UploadPartRequest.cpp [for host]; 0s local
    [Sched] Compiling external/boringssl/src/crypto/x509/x509_txt.c [for host]
    [Sched] Compiling external/boringssl/src/crypto/x509/x509_vfy.c [for host]
    [Sched] Compiling external/llvm-project/llvm/lib/Support/CRC.cpp [for host]
    [Sched] Compiling external/aws/aws-cpp-sdk-s3/source/model/TopicConfigurationDeprecated.cpp

That is taking quite some time...

Is this conf correct? the XLA will still be using GPU? if this is correct, guess I will be able to pull changes since that day and rebuild with latest XLA isnt?

Is there a command inside the docker shell that allows me to test if XLA is effectively running on GPU and not on CPU?

zcain117 commented 3 years ago

if you set XRT_TPU_CONFIG I think it will try to use TPU. I would use:

unset XRT_TPU_CONFIG
export GPU_NUM_DEVICES=1       (or however many GPUs you want to use)

After you see Done! in those mnist logs, it means it's done downloading data. After that download it sounds like the process hung, which is expected when you set XRT_TPU_CONFIG to an invalid value. In your case you want to use GPU_NUM_DEVICES instead

tyoc213 commented 3 years ago

Thanks, it is working!!!!

| Test Device=xla:1 Accuracy=98.98 Time=21:17:40
Epoch: 18, Mean Accuracy: 98.98%
Max Accuracy: 99.07%
.
----------------------------------------------------------------------
Ran 1 test in 164.207s

OK

real    2m45.772s
user    5m26.679s
sys 0m40.018s

Thought it doesn't say it is using GPU or CPU, only xla:1... and nvidia-smi (in my Linux host) doesn't show much memory consumption... dont know if cant see what is happening on docker?

UPDATE: it seems it is not using more memory that it is using some memory :)

Now to see if possible I want to "make changes" to some code, building it takes quite some time inside the docker image, but tomorrow I will fire up again the docker image and compile again from 0.

How I will go about this? can I compile and restart the docker and don't start from 0? (It took like >5 hours first compile with the sources in the image).

zcain117 commented 3 years ago

Nice job getting this working! The best way to add new code to the docker image would be to make a new Dockerfile that builds on the pytorch/xla docker image

So you'd make a new Dockerfile kind of like this:

FROM gcr.io/tpu-pytorch/xla:nightly_3.6

(all the work that you did on top of the base image, including those slow compiles)

Then you'd use docker build to make it, try it out and check that it works, then you could use docker push to push it somewhere if you want. Or you could continue using the image locally if it's stored on a consistent machine

tyoc213 commented 3 years ago

Thx, I will see, guess I will close, thx for the hints!

tyoc213 commented 3 years ago

@zcain117 tried to compile and run locally... conda create -n xla python=3.6 then installed all needed, compiled pytorch in about 2 hours and other time xla they both compiled OK,then did set the env var for GPU usage, but it ask about torchvision.

(xla) tyoc213@u:~/Documents/github/pytorch/xla$ python test/test_train_mnist.py 
Traceback (most recent call last):
  File "test/test_train_mnist.py", line 19, in <module>
    from torchvision import datasets, transforms
ModuleNotFoundError: No module named 'torchvision'

$ pip freeze
brotlipy==0.7.0
certifi==2020.11.8
cffi @ file:///tmp/build/80754af9/cffi_1605538083887/work
chardet @ file:///tmp/build/80754af9/chardet_1605303175790/work
cryptography @ file:///tmp/build/80754af9/cryptography_1605544480695/work
dataclasses==0.7
future==0.18.2
idna @ file:///tmp/build/80754af9/idna_1593446292537/work
lark-parser==0.11.1
mkl-fft==1.2.0
mkl-random==1.1.1
mkl-service==2.3.0
numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1603487797006/work
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1605545627475/work
PySocks @ file:///tmp/build/80754af9/pysocks_1605305763431/work
PyYAML==5.3.1
requests @ file:///tmp/build/80754af9/requests_1592841827918/work
six @ file:///tmp/build/80754af9/six_1605205335545/work
torch==1.8.0a0
torch-xla==1.6
typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1598376058250/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1603305693037/work

And

 python torch/utils/collect_env.py
Collecting environment information...
PyTorch version: 1.8.0a0+671ee71
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1 
CMake version: version 3.18.2

Python version: 3.6 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 2080
Nvidia driver version: 450.80.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.0a0
[pip3] torch-xla==1.6
[conda] blas                      1.0                         mkl  
[conda] magma-cuda110             2.5.2                         1    pytorch
[conda] mkl                       2020.2                      256  
[conda] mkl-include               2020.2                      256  
[conda] mkl-service               2.3.0            py36he904b0f_0  
[conda] mkl_fft                   1.2.0            py36h23d657b_0  
[conda] mkl_random                1.1.1            py36h0573a6f_0  
[conda] numpy                     1.19.2           py36h54aff64_0  
[conda] numpy-base                1.19.2           py36hfa32c7d_0  
[conda] torch                     1.8.0a0                  pypi_0    pypi
[conda] torch-xla                 1.6                      pypi_0    pypi

Any hint? well I shoud build torchvision, but with which params? because if I do pip install, it tries to download pytorch and so on...

zcain117 commented 3 years ago

I think it would be easier to start with our docker image or our conda env and then install the extra libraries you need

You can see here how we build torch, torch_xla, and torchvision to produce our wheels, which we use later to offer in our docker image and conda env

tyoc213 commented 3 years ago

Fixed the torchvision install @zcain117 but now it shows this, should I open a new issue, or we try to solve it here?

 python test/test_train_mp_mnist.py 
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /tmp/mnist-data/0/MNIST/raw/train-images-idx3-ubyte.gz
100.1%Extracting /tmp/mnist-data/0/MNIST/raw/train-images-idx3-ubyte.gz to /tmp/mnist-data/0/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /tmp/mnist-data/0/MNIST/raw/train-labels-idx1-ubyte.gz
113.5%Extracting /tmp/mnist-data/0/MNIST/raw/train-labels-idx1-ubyte.gz to /tmp/mnist-data/0/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /tmp/mnist-data/0/MNIST/raw/t10k-images-idx3-ubyte.gz
100.4%Extracting /tmp/mnist-data/0/MNIST/raw/t10k-images-idx3-ubyte.gz to /tmp/mnist-data/0/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /tmp/mnist-data/0/MNIST/raw/t10k-labels-idx1-ubyte.gz
180.4%Extracting /tmp/mnist-data/0/MNIST/raw/t10k-labels-idx1-ubyte.gz to /tmp/mnist-data/0/MNIST/raw
Processing...
/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torchvision/datasets/mnist.py:480: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:143.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Done!
Traceback (most recent call last):
  File "test/test_train_mp_mnist.py", line 190, in <module>
    xmp.spawn(_mp_fn, args=(FLAGS,), nprocs=FLAGS.num_cores)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 386, in spawn
    _start_fn(0, pf_cfg, fn, args)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
    fn(gindex, *args)
  File "test/test_train_mp_mnist.py", line 180, in _mp_fn
    accuracy = train_mnist()
  File "test/test_train_mp_mnist.py", line 115, in train_mnist
    model = MNIST().to(device)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 629, in to
    return self._apply(convert)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 381, in _apply
    param_applied = fn(param)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 627, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:383 : Check failed: session->session()->Run( session_work->feed_inputs, session_work->outputs_handles, &outputs) == ::tensorflow::Status::OK() (Invalid argument: Cannot assign a device for operation XRTAllocateFromTensor: {{node XRTAllocateFromTensor}} was explicitly assigned to /job:localservice/replica:0/task:0/device:XLA_GPU:0 but available devices are [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
     [[XRTAllocateFromTensor]] vs. OK)
*** Begin stack trace ***
    tensorflow::CurrentStackTrace[abi:cxx11]()

    xla::util::MultiWait::Complete(std::function<void ()> const&)

    clone
*** End stack trace ***

zcain117 commented 3 years ago

Did you set GPU_NUM_DEVICES=1?

And @JackCaoG is there any other reason you can think why we would see:

RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:383 : Check failed: session->session()->Run( session_work->feed_inputs, session_work->outputs_handles, &outputs) == ::tensorflow::Status::OK() (Invalid argument: Cannot assign a device for operation XRTAllocateFromTensor: {{node XRTAllocateFromTensor}} was explicitly assigned to /job:localservice/replica:0/task:0/device:XLA_GPU:0 but available devices are [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.

@tyoc213 can you share more information about your env too. What is version of torch, torch_xla, and torchvision. What is the code, for example how are you getting the device?

tyoc213 commented 3 years ago

Yeah did set it like

$ export GPU_NUM_DEVICES=1
(xla) tyoc213@u:~/Documents/github/pytorch/xla$ python test/test_train_mnist.py 
E
======================================================================
ERROR: test_accurracy (__main__.TrainMnist)
----------------------------------------------------------------------
.................
RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:383 : Check failed: session->session()->Run( session_work->feed_inputs, session_work->outputs_handles, &outputs) == ::tensorflow::Status::OK() (Invalid argument: Cannot assign a device for operation XRTAllocateFromTensor: {{node XRTAllocateFromTensor}} was explicitly assigned to /job:localservice/replica:0/task:0/device:XLA_GPU:0 but available devices are [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
     [[XRTAllocateFromTensor]] vs. OK)
*** Begin stack trace ***
    tensorflow::CurrentStackTrace[abi:cxx11]()

    xla::util::MultiWait::Complete(std::function<void ()> const&)

    clone
*** End stack trace ***

----------------------------------------------------------------------
Ran 1 test in 0.068s

FAILED (errors=1)

same with $ GPU_NUM_DEVICES=1 python test/test_train_mnist.py

I set with conda create -n xla python=3.6 as place to work before all compilation from source as described in CONTRIBUTING.md then The follow the steps for building, so pytorch, xla and torchvision are compiled form source.

$ python torch/utils/collect_env.py Collecting environment information... PyTorch version: 1.8.0a0+671ee71 Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.18.2

Python version: 3.6 (64-bit runtime) Is CUDA available: False CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce RTX 2080 Nvidia driver version: 450.80.02 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.2 [pip3] torch==1.8.0a0 [pip3] torch-xla==1.6 [pip3] torchvision==0.9.0a0+4521f6d [conda] blas 1.0 mkl
[conda] magma-cuda110 2.5.2 1 pytorch [conda] mkl 2020.2 256
[conda] mkl-include 2020.2 256
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.2.0 py36h23d657b_0
[conda] mkl_random 1.1.1 py36h0573a6f_0
[conda] numpy 1.19.2 py36h54aff64_0
[conda] numpy-base 1.19.2 py36hfa32c7d_0
[conda] torch 1.8.0a0 pypi_0 pypi [conda] torch-xla 1.6 pypi_0 pypi [conda] torchvision 0.9.0a0+4521f6d pypi_0 pypi

I guess the code for get the device is

https://github.com/pytorch/xla/blob/master/test/test_train_mnist.py#L110L112

devices = (
      xm.get_xla_supported_devices(
          max_devices=FLAGS.num_cores) if FLAGS.num_cores != 0 else [])

And printng FLAGS and devices

Namespace(batch_size=128, datadir='/tmp/mnist-data', drop_last=False, fake_data=False, log_steps=20, logdir=None, lr=0.01, metrics_debug=False, momentum=0.5, num_cores=None, num_epochs=18, num_workers=4, target_accuracy=98.0, tidy=False)
Devices ['xla:1']

JackCaoG commented 3 years ago

It seems like it can not find the gpu device. This is my config for a 4 gpu setup

export TF_CUDA_PATHS="/usr/local/cuda,/usr"
# Enabled 4 GPU devices (trim to 1 eventually)
export XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0|GPU:0;/job:localservice/replica:0/task:
0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:0/device:XLA_GPU:1|GPU:2;/job:localservice/replica:0/task:0/de
vice:XLA_GPU:2|GPU:3;/job:localservice/replica:0/task:0/device:XLA_GPU:3"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"

This config might be a bit out of date, I haven't change it for a while

tyoc213 commented 3 years ago

Hi there, I only have 1 GPU (probably some weeks now will get a second one)

(xla) tyoc213@u:~/Documents/github/pytorch/xla$ export XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0|GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0"
(xla) tyoc213@u:~/Documents/github/pytorch/xla$ export XRT_WORKERS="localservice:0;grpc://localhost:40934"
(xla) tyoc213@u:~/Documents/github/pytorch/xla$ GPU_NUM_DEVICES=1 python test/test_train_mnist.py 
-------------------------------
Namespace(batch_size=128, datadir='/tmp/mnist-data', drop_last=False, fake_data=False, log_steps=20, logdir=None, lr=0.01, metrics_debug=False, momentum=0.5, num_cores=None, num_epochs=18, num_workers=4, target_accuracy=98.0, tidy=False)
Devices ['xla:1']
Hi!!!!
E
======================================================================
ERROR: test_accurracy (__main__.TrainMnist)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_train_mnist.py", line 189, in test_accurracy
    self.assertGreaterEqual(train_mnist(), FLAGS.target_accuracy)
  File "test/test_train_mnist.py", line 119, in train_mnist
    model_parallel = dp.DataParallel(MNIST, device_ids=devices)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg/torch_xla/distributed/data_parallel.py", line 58, in __init__
    device_module = deepcopy(module).to(device=torch.device(device))
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 629, in to
    return self._apply(convert)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 381, in _apply
    param_applied = fn(param)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 627, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:394 : Check failed: session->session()->Run( session_work->feed_inputs, session_work->outputs_handles, &outputs) == ::tensorflow::Status::OK() (Invalid argument: Cannot assign a device for operation XRTAllocateFromTensor: {{node XRTAllocateFromTensor}} was explicitly assigned to /job:localservice/replica:0/task:0/device:XLA_GPU:0 but available devices are [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
     [[XRTAllocateFromTensor]] vs. OK)
*** Begin stack trace ***
    tensorflow::CurrentStackTrace[abi:cxx11]()

    xla::util::MultiWait::Complete(std::function<void ()> const&)

    clone
*** End stack trace ***

----------------------------------------------------------------------
Ran 1 test in 0.054s

FAILED (errors=1)

That Hi!!! is from C++ that is why the line changed to xrt_computation_client.cc:394, so if you want that I put any code to check XLA GPU directly on C++ just throw the patch :+1: :).

By the way, that export for TF_CUDA_PATHS I dont have them there, what file should I search inside those paths to search them locally, so I set correct path for that env var? right now I just put the one you provided.

My cuda drivers are managed by system so I dont need to reinstall on nvidia driver update with

zcain117 commented 3 years ago

I see Is CUDA available: False in your env

maybe that is part of the problem. In our script to build wheels, there are a few spots for setting up cuda, such as:

https://github.com/pytorch/xla/blob/master/scripts/build_torch_wheels.sh#L40

https://github.com/pytorch/xla/blob/master/scripts/build_torch_wheels.sh#L61

tyoc213 commented 3 years ago

Ok, thought it was to enable only CUDA for XLA and disable for pytorch, will check those 2 points, and see how to enable it. (BRB)

zcain117 commented 3 years ago

I was just guessing, not sure. Our XLA GPU support is very new and we don't have any tests running for it yet. I have only gotten it working with docker, not with conda. Maybe the fix is some of those env vars you see in build_torch_wheels.sh

tyoc213 commented 3 years ago

almost there, but got this error almost 3 and a half hours after

/usr/include/c++/9/bits/stl_list.h:140:28: error: #if with no expression
  140 | # if _GLIBCXX_USE_CXX11_ABI
      |                            ^
/usr/include/c++/9/bits/stl_list.h:152:27: error: #if with no expression
  152 | #if _GLIBCXX_USE_CXX11_ABI
      |                           ^
/usr/include/c++/9/bits/stl_list.h:399:27: error: #if with no expression
  399 | #if _GLIBCXX_USE_CXX11_ABI
      |                           ^
/usr/include/c++/9/bits/stl_list.h:640:27: error: #if with no expression
  640 | #if _GLIBCXX_USE_CXX11_ABI
      |                           ^
/usr/include/c++/9/bits/stl_list.h:1993:27: error: #if with no expression
 1993 | #if _GLIBCXX_USE_CXX11_ABI
      |                           ^
/usr/include/c++/9/bits/stl_list.h:2062:27: error: #if with no expression
 2062 | #if _GLIBCXX_USE_CXX11_ABI
      |                           ^
In file included from /usr/include/c++/9/list:64,
                 from /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/external/com_google_absl/absl/hash/internal/hash.h:31,
                 from /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/external/com_google_absl/absl/hash/hash.h:73,
                 from /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/external/com_google_absl/absl/container/internal/hash_function_defaults.h:55,
                 from /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/external/com_google_absl/absl/container/flat_hash_map.h:40,
                 from /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow/tensorflow/compiler/xla/client/xla_builder.h:24,
                 from /home/tyoc213/Documents/github/pytorch/xla/torch_xla/csrc/helpers.h:12,
                 from /home/tyoc213/Documents/github/pytorch/xla/test/cpp/torch_xla_test.cpp:10:
/usr/include/c++/9/bits/list.tcc:179:27: error: #if with no expression
  179 | #if _GLIBCXX_USE_CXX11_ABI
      |                           ^
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:82: CMakeFiles/test_ptxla.dir/main.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:108: CMakeFiles/test_ptxla.dir/metrics_snapshot.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:225: CMakeFiles/test_ptxla.dir/torch_xla_test.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:160: CMakeFiles/test_ptxla.dir/test_mayberef.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:95: CMakeFiles/test_ptxla.dir/cpp_test_util.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:147: CMakeFiles/test_ptxla.dir/test_ir.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:121: CMakeFiles/test_ptxla.dir/test_async_task.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:199: CMakeFiles/test_ptxla.dir/test_tensor.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:212: CMakeFiles/test_ptxla.dir/test_xla_util_cache.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:173: CMakeFiles/test_ptxla.dir/test_op_by_op_executor.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:186: CMakeFiles/test_ptxla.dir/test_replication.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_ptxla.dir/build.make:134: CMakeFiles/test_ptxla.dir/test_aten_xla_tensor.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:97: CMakeFiles/test_ptxla.dir/all] Error 2
make: *** [Makefile:103: all] Error 2
Failed to build tests: ['/home/tyoc213/Documents/github/pytorch/xla/test/cpp/run_tests.sh', '-B']

real    221m47.624s
user    23m48.158s
sys 1m27.458s

build.log

It seems used here

https://github.com/pytorch/xla/blob/8af57fb58f54fa7ac9e56505e202ae6ae034879a/setup.py#L122

And

https://github.com/pytorch/xla/blob/0ce386bdebe7c3c7c130814dc3e0905afd22aded/test/cpp/CMakeLists.txt#L14

If I execute that inside the docker img

(pytorch) root@4126231fbe83:/# python -c "import torch; print(int(torch._C._GLIBCXX_USE_CXX11_ABI))"
0

tyoc213 commented 3 years ago

a wait... on my computer inside the xla env I got this

(xla) tyoc213@u:~/Documents/github/pytorch/xla$ python -c "import torch; print(int(torch._C._GLIBCXX_USE_CXX11_ABI))"
1

mmmm, probably my pytorch was not constructed correctly even that I did export CC=clang-8 CXX=clang++-8????

tyoc213 commented 3 years ago

Looking at TorchConfig.cmake

$ cat torch/share/cmake/Torch/TorchConfig.cmake | grep 11
  set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=")

When reading this threead https://github.com/pytorch/pytorch/issues/17492#issuecomment-490214327

Guess I will set to 0 and start compiling again pytorch?

Well, guess Im stuck, suguestions?

From https://discuss.pytorch.org/t/undefined-symbol-when-import-lltm-cpp-extension/32627/2 this

(xla) tyoc213@u:~/Documents/github/pytorch/xla$ find  ~/miniconda3/envs/xla/lib/python3.6/site-packages/torch* -name "*.so" -exec bash -c "nm -D {} | grep SourceLocation" \;

                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c107Warning4warnENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
000000000374dd80 T _ZN5torch3jit6tracer20recordSourceLocationEPNS0_4NodeE
000000000375cf50 T _ZN5torch3jit6tracer23setRecordSourceLocationEPFvPNS0_4NodeEE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
000000000004d6d0 T _ZN3c1014WarningHandler7processERKNS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
000000000004e5a0 T _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
000000000004e5a0 T _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
000000000004d590 T _ZN3c107Warning4warnENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c107Warning4warnENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
0000000000776e90 T _ZN5torch16PyWarningHandler7processERKN3c1014SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
                 U _ZN5torch3jit6tracer20recordSourceLocationEPNS0_4NodeE
                 U _ZN5torch3jit6tracer23setRecordSourceLocationEPFvPNS0_4NodeEE
0000000000a81bb0 T _ZN5torch3jit6tracer26pythonRecordSourceLocationEPNS0_4NodeE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c107Warning4warnENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
00000000004841e0 T _ZN9torch_xlalsERSoRKSt6vectorINS_14SourceLocationESaIS2_EE
0000000000484450 W _ZNSt6vectorIN9torch_xla14SourceLocationESaIS1_EE17_M_realloc_insertIJS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_
                 U _ZN3c105ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
00000000004841e0 T _ZN9torch_xlalsERSoRKSt6vectorINS_14SourceLocationESaIS2_EE
0000000000484450 W _ZNSt6vectorIN9torch_xla14SourceLocationESaIS1_EE17_M_realloc_insertIJS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_
000000000bb038d0 T _ZN6google8protobuf8compiler19SourceLocationTable3AddEPKNS0_7MessageENS0_14DescriptorPool14ErrorCollector13ErrorLocationEii
000000000bb0f4f0 T _ZN6google8protobuf8compiler19SourceLocationTable5ClearEv
000000000bb039b0 T _ZN6google8protobuf8compiler19SourceLocationTable9AddImportEPKNS0_7MessageERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEii
000000000bb0f120 T _ZN6google8protobuf8compiler19SourceLocationTableC1Ev
000000000bb0f120 T _ZN6google8protobuf8compiler19SourceLocationTableC2Ev
000000000bb0f170 T _ZN6google8protobuf8compiler19SourceLocationTableD1Ev
000000000bb0f170 T _ZN6google8protobuf8compiler19SourceLocationTableD2Ev
000000000bb24360 T _ZNK6google8protobuf10Descriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb252f0 T _ZNK6google8protobuf14EnumDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb24240 T _ZNK6google8protobuf14FileDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb240f0 T _ZNK6google8protobuf14FileDescriptor17GetSourceLocationERKSt6vectorIiSaIiEEPNS0_14SourceLocationE
000000000bb24880 T _ZNK6google8protobuf15FieldDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb25030 T _ZNK6google8protobuf15OneofDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb25810 T _ZNK6google8protobuf16MethodDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb25ad0 T _ZNK6google8protobuf17ServiceDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb25d80 T _ZNK6google8protobuf19EnumValueDescriptor17GetSourceLocationEPNS0_14SourceLocationE
000000000bb143a0 T _ZNK6google8protobuf20FileDescriptorTables17GetSourceLocationERKSt6vectorIiSaIiEEPKNS0_14SourceCodeInfoE
000000000bb0f230 T _ZNK6google8protobuf8compiler19SourceLocationTable10FindImportEPKNS0_7MessageERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPiSE_
000000000bb0f1b0 T _ZNK6google8protobuf8compiler19SourceLocationTable4FindEPKNS0_7MessageENS0_14DescriptorPool14ErrorCollector13ErrorLocationEPiS9_

So unstuck again??, I guess I will let computer working night compiling inside pytorch again with export CFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0 $CFLAGS" and tomorrow see if can finish compiling xla (but didnt get if xla is an extension so I need to set back to 1???, but will start with pytorch set to 0).

Or it is xla which I need set/force x11 ABI to 0???

SUBCOMMAND: # @llvm-project//llvm:CodeGen [action 'Compiling external/llvm-project/llvm/lib/CodeGen/ExpandPostRAPseudos.cpp', configuration: de7de810a2b8c83f94aaebd5888fb8bdaa61e7f5b5cf58c9ddc23dcfd6221e1a]
(cd /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/tyoc213/.cache/bazelisk/downloads/bazelbuild/bazel-3.1.0-linux-x86_64/bin:/home/tyoc213/.deta/bin:/home/tyoc213/miniconda3/envs/xla/bin:/home/tyoc213/miniconda3/condabin:/home/tyoc213/.rvm/gems/ruby-2.7.0/bin:/home/tyoc213/.rvm/gems/ruby-2.7.0@global/bin:/home/tyoc213/.rvm/rubies/ruby-2.7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/tyoc213/.rvm/bin:/home/tyoc213/.rvm/bin:/usr/local/go/bin:/home/tyoc213/go/bin \
    PWD=/proc/self/cwd \
    TF2_BEHAVIOR=1 \
    TF_NEED_CUDA=1 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/CodeGen/ExpandPostRAPseudos.pic.d '-frandom-seed=bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/CodeGen/ExpandPostRAPseudos.pic.o' -DLLVM_ENABLE_STATS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DLLVM_BUILD_GLOBAL_ISEL -iquote external/llvm-project -iquote bazel-out/k8-dbg/bin/external/llvm-project -iquote external/zlib -iquote bazel-out/k8-dbg/bin/external/zlib -isystem external/llvm-project/llvm/include -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/include -isystem external/zlib -isystem bazel-out/k8-dbg/bin/external/zlib -isystem external/llvm-project/llvm/include/llvm/IR -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/include/llvm/IR -isystem external/llvm-project/llvm/lib/Target/AMDGPU -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/lib/Target/AMDGPU -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -g -w -DAUTOLOAD_DYNAMIC_KERNELS '-std=c++14' '-D_GLIBCXX_USE_CXX11_ABI=1' '-std=c++14' -Wno-c++11-narrowing '-DXLA_CUDA=1' -c external/llvm-project/llvm/lib/CodeGen/ExpandPostRAPseudos.cpp -o bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/CodeGen/ExpandPostRAPseudos.pic.o)
SUBCOMMAND: # @llvm-project//llvm:CodeGen [action 'Compiling external/llvm-project/llvm/lib/CodeGen/ShadowStackGCLowering.cpp', configuration: de7de810a2b8c83f94aaebd5888fb8bdaa61e7f5b5cf58c9ddc23dcfd6221e1a]
(cd /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/tyoc213/.cache/bazelisk/downloads/bazelbuild/bazel-3.1.0-linux-x86_64/bin:/home/tyoc213/.deta/bin:/home/tyoc213/miniconda3/envs/xla/bin:/home/tyoc213/miniconda3/condabin:/home/tyoc213/.rvm/gems/ruby-2.7.0/bin:/home/tyoc213/.rvm/gems/ruby-2.7.0@global/bin:/home/tyoc213/.rvm/rubies/ruby-2.7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/tyoc213/.rvm/bin:/home/tyoc213/.rvm/bin:/usr/local/go/bin:/home/tyoc213/go/bin \
    PWD=/proc/self/cwd \
    TF2_BEHAVIOR=1 \
    TF_NEED_CUDA=1 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/CodeGen/ShadowStackGCLowering.pic.d '-frandom-seed=bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/CodeGen/ShadowStackGCLowering.pic.o' -DLLVM_ENABLE_STATS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DLLVM_BUILD_GLOBAL_ISEL -iquote external/llvm-project -iquote bazel-out/k8-dbg/bin/external/llvm-project -iquote external/zlib -iquote bazel-out/k8-dbg/bin/external/zlib -isystem external/llvm-project/llvm/include -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/include -isystem external/zlib -isystem bazel-out/k8-dbg/bin/external/zlib -isystem external/llvm-project/llvm/include/llvm/IR -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/include/llvm/IR -isystem external/llvm-project/llvm/lib/Target/AMDGPU -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/lib/Target/AMDGPU -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -g -w -DAUTOLOAD_DYNAMIC_KERNELS '-std=c++14' '-D_GLIBCXX_USE_CXX11_ABI=1' '-std=c++14' -Wno-c++11-narrowing '-DXLA_CUDA=1' -c external/llvm-project/llvm/lib/CodeGen/ShadowStackGCLowering.cpp -o bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/CodeGen/ShadowStackGCLowering.pic.o)

You see that '-D_GLIBCXX_USE_CXX11_ABI=1' passed on... I wonder

Trying to set directly the variable did not causse the desired effect of put a 0 there... and it seems that bazel is the one requesting compatibility to 1.

$ time python setup.py install
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Building torch_xla version: 1.6
XLA Commit ID: 403919d167efc64e6d86b269fdd1e2c4a69903de
PyTorch Commit ID: 671ee71ad4b6f507218d1cad278a8e743780b716
Extracted 1779 functions (0 errors) from /home/tyoc213/Documents/github/pytorch/xla/scripts/../../build/aten/src/ATen/RegistrationDeclarations.h
365 function overrides in /home/tyoc213/Documents/github/pytorch/xla/scripts/../torch_xla/csrc/aten_xla_type.h
Generated 1779 wrappers for /home/tyoc213/Documents/github/pytorch/xla/scripts/../../build/aten/src/ATen/RegistrationDeclarations.h
+ OPTS=()
+ getopts O: OPTION
+ case $OPTION in
+ for i in ${OPTARG}
+ OPTS+=("--cxxopt=${i}")
+ getopts O: OPTION
+ shift 2
+ CMD=install
++ dirname /home/tyoc213/Documents/github/pytorch/xla/build_torch_xla_libs.sh
+ cd /home/tyoc213/Documents/github/pytorch/xla
+++ pwd
++ printf '%q\n' /home/tyoc213/Documents/github/pytorch/xla
+ PWD=/home/tyoc213/Documents/github/pytorch/xla
+ BASE_DIR=/home/tyoc213/Documents/github/pytorch/xla
+ echo /home/tyoc213/Documents/github/pytorch/xla
/home/tyoc213/Documents/github/pytorch/xla
+ THIRD_PARTY_DIR=/home/tyoc213/Documents/github/pytorch/xla/third_party
+ MODE=opt
+ [[ 1 == \1 ]]
+ MODE=dbg
+ VERBOSE=
+ [[ 1 == \1 ]]
+ VERBOSE=-s
+ MAX_JOBS=
+ [[ 1 == \1 ]]
+ [[ '' == \t\r\u\e ]]
+ OPTS+=(--cxxopt="-std=c++14")
++ basename -- clang-8
+ [[ clang-8 =~ ^clang ]]
+ OPTS+=(--cxxopt="-Wno-c++11-narrowing")
+ [[ 1 == \1 ]]
+ OPTS+=(--cxxopt="-DXLA_CUDA=1")
+ OPTS+=(--config=cuda)
+ '[' install == clean ']'
+ cp -r -u -p /home/tyoc213/Documents/github/pytorch/xla/third_party/xla_client /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow/tensorflow/compiler/xla/
+ pushd /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow
~/Documents/github/pytorch/xla/third_party/tensorflow ~/Documents/github/pytorch/xla
+ bazel build -s --define framework_shared_object=false -c dbg --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=1 --cxxopt=-std=c++14 --cxxopt=-Wno-c++11-narrowing --cxxopt=-DXLA_CUDA=1 --config=cuda //tensorflow/compiler/xla/xla_client:libxla_computation_client.so

Last line just after execute (xla) tyoc213@u:~/Documents/github/pytorch/xla$ time python setup.py install

bazel build -s --define framework_shared_object=false -c dbg --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=1 --cxxopt=-std=c++14 --cxxopt=-Wno-c++11-narrowing --cxxopt=-DXLA_CUDA=1 --config=cuda

Or better, will just wait for feedback!!!! to see how to solve this behaviour.

tyoc213 commented 3 years ago

Think I have found what I was missing 18fa011c3eba79b93ea38cf5b0946920e45b2053

(xla) tyoc213@u:~/Documents/github/pytorch/xla$ time python setup.py install
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Building torch_xla version: 1.6
XLA Commit ID: 403919d167efc64e6d86b269fdd1e2c4a69903de
PyTorch Commit ID: 671ee71ad4b6f507218d1cad278a8e743780b716
Extracted 1779 functions (0 errors) from /home/tyoc213/Documents/github/pytorch/xla/scripts/../../build/aten/src/ATen/RegistrationDeclarations.h
365 function overrides in /home/tyoc213/Documents/github/pytorch/xla/scripts/../torch_xla/csrc/aten_xla_type.h
Generated 1779 wrappers for /home/tyoc213/Documents/github/pytorch/xla/scripts/../../build/aten/src/ATen/RegistrationDeclarations.h
+ OPTS=()
+ getopts O: OPTION
+ case $OPTION in
+ for i in ${OPTARG}
+ OPTS+=("--cxxopt=${i}")
+ getopts O: OPTION
+ shift 2
+ CMD=install
++ dirname /home/tyoc213/Documents/github/pytorch/xla/build_torch_xla_libs.sh
+ cd /home/tyoc213/Documents/github/pytorch/xla
+++ pwd
++ printf '%q\n' /home/tyoc213/Documents/github/pytorch/xla
+ PWD=/home/tyoc213/Documents/github/pytorch/xla
+ BASE_DIR=/home/tyoc213/Documents/github/pytorch/xla
+ echo /home/tyoc213/Documents/github/pytorch/xla
/home/tyoc213/Documents/github/pytorch/xla
+ THIRD_PARTY_DIR=/home/tyoc213/Documents/github/pytorch/xla/third_party
+ MODE=opt
+ [[ 1 == \1 ]]
+ MODE=dbg
+ VERBOSE=
+ [[ 1 == \1 ]]
+ VERBOSE=-s
+ MAX_JOBS=
+ [[ 1 == \1 ]]
+ [[ '' == \t\r\u\e ]]
+ OPTS+=(--cxxopt="-std=c++14")
++ basename -- clang-8
+ [[ clang-8 =~ ^clang ]]
+ OPTS+=(--cxxopt="-Wno-c++11-narrowing")
+ [[ 1 == \1 ]]
+ OPTS+=(--cxxopt="-DXLA_CUDA=1")
+ OPTS+=(--config=cuda)
+ '[' install == clean ']'
+ cp -r -u -p /home/tyoc213/Documents/github/pytorch/xla/third_party/xla_client /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow/tensorflow/compiler/xla/
+ pushd /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow
~/Documents/github/pytorch/xla/third_party/tensorflow ~/Documents/github/pytorch/xla
+ bazel build -s --define framework_shared_object=false -c dbg --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=1 --cxxopt=-std=c++14 --cxxopt=-Wno-c++11-narrowing --cxxopt=-DXLA_CUDA=1 --config=cuda //tensorflow/compiler/xla/xla_client:libxla_computation_client.so
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=211
INFO: Reading rc options for 'build' from /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/tyoc213/Documents/github/pytorch/xla/third_party/tensorflow/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=short_logs --config=v2

Because even that bazel shows D_GLIBCXX_USE_CXX11_ABI=1 it seems that is not passed anymore to the compiler... _will seee tomorrow...

False alarm, I started to see D_GLIBCXX_USE_CXX11_ABI=1

SUBCOMMAND: # @llvm-project//llvm:Support [action 'Compiling external/llvm-project/llvm/lib/Support/YAMLParser.cpp', configuration: de7de810a2b8c83f94aaebd5888fb8bdaa61e7f5b5cf58c9ddc23dcfd6221e1a]
(cd /home/tyoc213/.cache/bazel/_bazel_tyoc213/19011bf2b17a0b6da5215ad1a05b9611/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/tyoc213/.cache/bazelisk/downloads/bazelbuild/bazel-3.1.0-linux-x86_64/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/go/bin:/home/tyoc213/go/bin:/home/tyoc213/.deta/bin:/home/tyoc213/miniconda3/envs/xla/bin:/home/tyoc213/miniconda3/condabin:/home/tyoc213/.rvm/gems/ruby-2.7.0/bin:/home/tyoc213/.rvm/gems/ruby-2.7.0@global/bin:/home/tyoc213/.rvm/rubies/ruby-2.7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/tyoc213/.rvm/bin:/home/tyoc213/.rvm/bin:/usr/local/go/bin:/home/tyoc213/go/bin \
    PWD=/proc/self/cwd \
    TF2_BEHAVIOR=1 \
    TF_NEED_CUDA=1 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/Support/YAMLParser.pic.d '-frandom-seed=bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/Support/YAMLParser.pic.o' -DLLVM_ENABLE_STATS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DLLVM_BUILD_GLOBAL_ISEL -iquote external/llvm-project -iquote bazel-out/k8-dbg/bin/external/llvm-project -iquote external/zlib -iquote bazel-out/k8-dbg/bin/external/zlib -isystem external/llvm-project/llvm/include -isystem bazel-out/k8-dbg/bin/external/llvm-project/llvm/include -isystem external/zlib -isystem bazel-out/k8-dbg/bin/external/zlib -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -g -w -DAUTOLOAD_DYNAMIC_KERNELS '-std=c++14' '-D_GLIBCXX_USE_CXX11_ABI=1' '-std=c++14' -Wno-c++11-narrowing '-DXLA_CUDA=1' -c external/llvm-project/llvm/lib/Support/YAMLParser.cpp -o bazel-out/k8-dbg/bin/external/llvm-project/llvm/_objs/Support/YAMLParser.pic.o)

So waiting for feedback :). And sorry for the spam this has launched.

tyoc213 commented 3 years ago

Today try "forcing" D_GLIBCXX_USE_CXX11_ABI=0 for building pytorch with clang... lib/libtorch_cuda.so: undefined reference to gloo::EnforceNotMet::EnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string

The end of log

[5102/5621] Linking CXX executable bin/List_test
FAILED: bin/List_test 
: && /usr/bin/clang++-8  -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp=libiomp5 -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG  -rdynamic -pthread caffe2/CMakeFiles/List_test.dir/__/aten/src/ATen/core/List_test.cpp.o  -o bin/List_test  -Wl,-rpath,/home/tyoc213/Documents/github/pytorch/build/lib:/home/tyoc213/miniconda3/envs/xla/lib:/usr/local/cuda/lib64:  lib/libgtest_main.a  lib/libtorch.so  lib/libtorch_cuda.so  lib/libtorch_cpu.so  lib/libprotobuf.a  /home/tyoc213/miniconda3/envs/xla/lib/libmkl_intel_lp64.so  /home/tyoc213/miniconda3/envs/xla/lib/libmkl_intel_thread.so  /home/tyoc213/miniconda3/envs/xla/lib/libmkl_core.so  /home/tyoc213/miniconda3/envs/xla/lib/libiomp5.so  /usr/lib/x86_64-linux-gnu/libpthread.so  -lm  /usr/lib/x86_64-linux-gnu/libdl.so  lib/libdnnl.a  -ldl  lib/libc10_cuda.so  lib/libc10.so  /usr/local/cuda/lib64/libcudart.so  /usr/lib/x86_64-linux-gnu/libnvToolsExt.so  /usr/local/cuda/lib64/libcufft.so  /usr/local/cuda/lib64/libcurand.so  /usr/local/cuda/lib64/libcublas.so  /usr/local/cuda/lib64/libcudnn.so  lib/libgtest.a  -pthread && :
/usr/bin/ld: lib/libtorch_cuda.so: undefined reference to `gloo::getCudaPCIBusID(int)'
/usr/bin/ld: lib/libtorch_cuda.so: undefined reference to `gloo::EnforceNotMet::EnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[5103/5621] Linking CXX executable bin/kernel_lambda_test
FAILED: bin/kernel_lambda_test 
: && /usr/bin/clang++-8  -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp=libiomp5 -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG  -rdynamic -pthread caffe2/CMakeFiles/kernel_lambda_test.dir/__/aten/src/ATen/core/boxing/impl/kernel_lambda_test.cpp.o  -o bin/kernel_lambda_test  -Wl,-rpath,/home/tyoc213/Documents/github/pytorch/build/lib:/home/tyoc213/miniconda3/envs/xla/lib:/usr/local/cuda/lib64:  lib/libgtest_main.a  lib/libtorch.so  lib/libtorch_cuda.so  lib/libtorch_cpu.so  lib/libprotobuf.a  /home/tyoc213/miniconda3/envs/xla/lib/libmkl_intel_lp64.so  /home/tyoc213/miniconda3/envs/xla/lib/libmkl_intel_thread.so  /home/tyoc213/miniconda3/envs/xla/lib/libmkl_core.so  /home/tyoc213/miniconda3/envs/xla/lib/libiomp5.so  /usr/lib/x86_64-linux-gnu/libpthread.so  -lm  /usr/lib/x86_64-linux-gnu/libdl.so  lib/libdnnl.a  -ldl  lib/libc10_cuda.so  lib/libc10.so  /usr/local/cuda/lib64/libcudart.so  /usr/lib/x86_64-linux-gnu/libnvToolsExt.so  /usr/local/cuda/lib64/libcufft.so  /usr/local/cuda/lib64/libcurand.so  /usr/local/cuda/lib64/libcublas.so  /usr/local/cuda/lib64/libcudnn.so  lib/libgtest.a  -pthread && :
/usr/bin/ld: lib/libtorch_cuda.so: undefined reference to `gloo::getCudaPCIBusID(int)'
/usr/bin/ld: lib/libtorch_cuda.so: undefined reference to `gloo::EnforceNotMet::EnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[5105/5621] Building CXX object caffe2/CMakeFiles/blob_test.dir/core/blob_test.cc.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "setup.py", line 765, in <module>
    build_deps()
  File "setup.py", line 320, in build_deps
    cmake=cmake)
  File "/home/tyoc213/Documents/github/pytorch/tools/build_pytorch_libs.py", line 58, in build_caffe2
    cmake.build(my_env)
  File "/home/tyoc213/Documents/github/pytorch/tools/setup_helpers/cmake.py", line 346, in build
    self.run(build_args, my_env)
  File "/home/tyoc213/Documents/github/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '4']' returned non-zero exit status 1.

tyoc213 commented 3 years ago

yey!!!

creating dist
creating 'dist/torch_xla-1.6-py3.6-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing torch_xla-1.6-py3.6-linux-x86_64.egg
removing '/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg' (and everything under it)
creating /home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg
Extracting torch_xla-1.6-py3.6-linux-x86_64.egg to /home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages
torch-xla 1.6 is already the active version in easy-install.pth

Installed /home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg
Processing dependencies for torch-xla==1.6
Finished processing dependencies for torch-xla==1.6

real    310m6.579s
user    18m15.809s
sys 0m46.391s
(xla) tyoc213@u:~/Documents/github/pytorch/xla$ GPU_NUM_DEVICES=1 python test/test_train_mnist.py 
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw/train-images-idx3-ubyte.gz
100.1%Extracting /tmp/mnist-data/MNIST/raw/train-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw/train-labels-idx1-ubyte.gz
113.5%Extracting /tmp/mnist-data/MNIST/raw/train-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw/t10k-images-idx3-ubyte.gz
100.4%Extracting /tmp/mnist-data/MNIST/raw/t10k-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw/t10k-labels-idx1-ubyte.gz
180.4%Extracting /tmp/mnist-data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /tmp/mnist-data/MNIST/raw
Processing...
/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torchvision/datasets/mnist.py:480: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:143.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Done!
-------------------------------
Namespace(batch_size=128, datadir='/tmp/mnist-data', drop_last=False, fake_data=False, log_steps=20, logdir=None, lr=0.01, metrics_debug=False, momentum=0.5, num_cores=None, num_epochs=18, num_workers=4, target_accuracy=98.0, tidy=False)
Devices ['xla:1']
| Training Device=xla:1 Step=0 Loss=2.39387 Rate=22.27 GlobalRate=22.27 Time=01:40:52
| Training Device=xla:1 Step=20 Loss=1.89920 Rate=705.78 GlobalRate=338.02 Time=01:40:55
| Training Device=xla:1 Step=40 Loss=1.53655 Rate=7301.92 GlobalRate=642.28 Time=01:40:55
| Training Device=xla:1 Step=60 Loss=1.15975 Rate=9244.97 GlobalRate=928.00 Time=01:40:55
| Training Device=xla:1 Step=80 Loss=0.88960 Rate=10819.44 GlobalRate=1201.46 Time=01:40:55
| Training Device=xla:1 Step=100 Loss=0.64108 Rate=11282.54 GlobalRate=1460.74 Time=01:40:56
| Training Device=xla:1 Step=120 Loss=0.61638 Rate=10608.90 GlobalRate=1701.55 Time=01:40:56
| Training Device=xla:1 Step=140 Loss=0.56248 Rate=11931.89 GlobalRate=1940.21 Time=01:40:56
| Training Device=xla:1 Step=160 Loss=0.43999 Rate=12205.81 GlobalRate=2167.27 Time=01:40:56
| Training Device=xla:1 Step=180 Loss=0.36814 Rate=12085.87 GlobalRate=2383.06 Time=01:40:56
| Training Device=xla:1 Step=200 Loss=0.35669 Rate=11310.73 GlobalRate=2583.36 Time=01:40:57
| Training Device=xla:1 Step=220 Loss=0.29758 Rate=12208.81 GlobalRate=2784.53 Time=01:40:57

it ended

| Training Device=xla:1 Step=260 Loss=0.01900 Rate=11762.88 GlobalRate=10227.78 Time=01:42:55
| Training Device=xla:1 Step=280 Loss=0.07001 Rate=10513.46 GlobalRate=10186.79 Time=01:42:55
| Training Device=xla:1 Step=300 Loss=0.04425 Rate=11148.52 GlobalRate=10268.42 Time=01:42:55
| Training Device=xla:1 Step=320 Loss=0.01296 Rate=11348.16 GlobalRate=10336.48 Time=01:42:55
| Training Device=xla:1 Step=340 Loss=0.00769 Rate=11608.61 GlobalRate=10411.42 Time=01:42:56
| Training Device=xla:1 Step=360 Loss=0.04162 Rate=11921.40 GlobalRate=10493.78 Time=01:42:56
| Training Device=xla:1 Step=380 Loss=0.00810 Rate=11889.04 GlobalRate=10557.93 Time=01:42:56
| Training Device=xla:1 Step=400 Loss=0.00523 Rate=12281.69 GlobalRate=10641.95 Time=01:42:56
| Training Device=xla:1 Step=420 Loss=0.04701 Rate=10899.44 GlobalRate=10608.41 Time=01:42:57
| Training Device=xla:1 Step=440 Loss=0.01576 Rate=11402.72 GlobalRate=10654.92 Time=01:42:57
| Training Device=xla:1 Step=460 Loss=0.00855 Rate=11852.48 GlobalRate=10712.19 Time=01:42:57
| Test Device=xla:1 Accuracy=99.07 Time=01:42:58
Epoch: 17, Mean Accuracy: 99.07%
| Training Device=xla:1 Step=0 Loss=0.00389 Rate=445.03 GlobalRate=445.02 Time=01:42:59
| Training Device=xla:1 Step=20 Loss=0.00971 Rate=5302.44 GlobalRate=4576.33 Time=01:42:59
| Training Device=xla:1 Step=40 Loss=0.00490 Rate=9665.09 GlobalRate=6634.88 Time=01:42:59
| Training Device=xla:1 Step=60 Loss=0.03418 Rate=10904.95 GlobalRate=7736.94 Time=01:42:59
| Training Device=xla:1 Step=80 Loss=0.01144 Rate=11444.04 GlobalRate=8456.28 Time=01:43:00
| Training Device=xla:1 Step=100 Loss=0.00777 Rate=11537.33 GlobalRate=8935.77 Time=01:43:00
| Training Device=xla:1 Step=120 Loss=0.01212 Rate=11278.33 GlobalRate=9233.99 Time=01:43:00
| Training Device=xla:1 Step=140 Loss=0.00998 Rate=12112.54 GlobalRate=9603.29 Time=01:43:00
| Training Device=xla:1 Step=160 Loss=0.02641 Rate=10176.85 GlobalRate=9507.95 Time=01:43:01
| Training Device=xla:1 Step=180 Loss=0.00703 Rate=10549.59 GlobalRate=9635.20 Time=01:43:01
| Training Device=xla:1 Step=200 Loss=0.02201 Rate=11063.13 GlobalRate=9786.35 Time=01:43:01
| Training Device=xla:1 Step=220 Loss=0.01694 Rate=11870.98 GlobalRate=9977.21 Time=01:43:01
| Training Device=xla:1 Step=240 Loss=0.03702 Rate=12365.11 GlobalRate=10157.65 Time=01:43:01
| Training Device=xla:1 Step=260 Loss=0.04479 Rate=12295.61 GlobalRate=10292.32 Time=01:43:02
| Training Device=xla:1 Step=280 Loss=0.02467 Rate=12445.83 GlobalRate=10425.57 Time=01:43:02
| Training Device=xla:1 Step=300 Loss=0.02186 Rate=12715.59 GlobalRate=10560.00 Time=01:43:02
| Training Device=xla:1 Step=320 Loss=0.03878 Rate=11488.12 GlobalRate=10566.71 Time=01:43:02
| Training Device=xla:1 Step=340 Loss=0.01772 Rate=11001.48 GlobalRate=10573.18 Time=01:43:03
| Training Device=xla:1 Step=360 Loss=0.00682 Rate=11164.29 GlobalRate=10609.63 Time=01:43:03
| Training Device=xla:1 Step=380 Loss=0.01867 Rate=11689.93 GlobalRate=10676.26 Time=01:43:03
| Training Device=xla:1 Step=400 Loss=0.01424 Rate=11223.13 GlobalRate=10687.77 Time=01:43:03
| Training Device=xla:1 Step=420 Loss=0.01680 Rate=11231.46 GlobalRate=10712.64 Time=01:43:03
| Training Device=xla:1 Step=440 Loss=0.00856 Rate=11877.08 GlobalRate=10775.97 Time=01:43:04
| Training Device=xla:1 Step=460 Loss=0.02040 Rate=11015.89 GlobalRate=10761.02 Time=01:43:04
| Test Device=xla:1 Accuracy=99.00 Time=01:43:05
Epoch: 18, Mean Accuracy: 99.00%
Max Accuracy: 99.07%
.
----------------------------------------------------------------------
Ran 1 test in 151.185s

OK

So, what you think?

haha, only could capture the start with this little frame

tyoc213 commented 3 years ago

@zcain117 @JackCaoG congrats, is working!

So I checked this:

>>> xm.get_xla_supported_devices("CPU")
['xla:0']
>>> xm.get_xla_supported_devices("GPU")
['xla:1']
>>> xm.get_xla_supported_devices("TPU")
>>>

Then

$ GPU_NUM_DEVICES=1 time python test/test_operations.py 
test/test_operations.py:1114: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  self.assertIsNone(a.grad)
----------------------------------------------------------------------
Ran 123 tests in 41.180s

OK
35.51user 16.89system 0:42.66elapsed 122%CPU (0avgtext+0avgdata 5418048maxresident)k
58992inputs+12568outputs (248major+2415509minor)pagefaults 0swaps

I was going to say trainning for 1 TPU device.. should say TPU core? or just a XM core?

| Test Device=xla:1 Accuracy=41.03 Time=11:49:58
Epoch: 1, Mean Accuracy: 41.03%
| Test Device=xla:1 Accuracy=55.99 Time=11:50:28
Epoch: 2, Mean Accuracy: 55.99%
Epoch: 3, Mean Accuracy: 60.40%
Epoch: 4, Mean Accuracy: 64.90%
Epoch: 5, Mean Accuracy: 68.38%
Epoch: 10, Mean Accuracy: 76.36%
Epoch: 11, Mean Accuracy: 73.10%
Epoch: 12, Mean Accuracy: 81.47%
Epoch: 13, Mean Accuracy: 80.90%
Epoch: 14, Mean Accuracy: 82.41%
Epoch: 15, Mean Accuracy: 80.65%
Epoch: 16, Mean Accuracy: 80.36%
Epoch: 17, Mean Accuracy: 83.71%
Epoch: 18, Mean Accuracy: 80.88%
Epoch: 19, Mean Accuracy: 73.21%
Epoch: 20, Mean Accuracy: 80.08%
Epoch: 21, Mean Accuracy: 81.89%
Epoch: 22, Mean Accuracy: 82.23%
Epoch: 23, Mean Accuracy: 81.05%
Epoch: 24, Mean Accuracy: 83.24%
Epoch: 25, Mean Accuracy: 80.92%
Max Accuracy: 83.71%
.
----------------------------------------------------------------------
Ran 1 test in 844.780s

OK

Imagenette is missing /tmp/imagenette so I could only run with fake

GPU_NUM_DEVICES=1 python test/test_train_imagenet.py --fake_data >~train_img.log 2>&1

Is there a way to limit memory to test on my GPU? I dont know why it generated such a large log train_img.log

You have already see mnist training. But was tryint to test the multi processing... but

$ GPU_NUM_DEVICES=1 python test/test_train_mp_mnist.py 
Traceback (most recent call last):
  File "test/test_train_mp_mnist.py", line 190, in <module>
    xmp.spawn(_mp_fn, args=(FLAGS,), nprocs=FLAGS.num_cores)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 384, in spawn
    pf_cfg = _pre_fork_setup(nprocs)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 210, in _pre_fork_setup
    _setup_workers(num_devices)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch_xla-1.6-py3.6-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 157, in _setup_workers
    m = re.match(r'(.*):(\d+)$', worker.host_port)
AttributeError: 'str' object has no attribute 'host_port'

Same error for mp_imagenet.py.

Maybe this is no sense, but thinking if I get 2 GPUs installed, do I need 8? or 2 is OK to test mp? (do they need to be the exact model or differents are ok?)

Some extra questions:

right now Im trying to track a model that seems not to "learn" ie the gradients are not changing, do you have any hint on how to debug this? (it is mostly a problem of the way Im doing things, because plain pytorch works as you have seen the training)
so the GPU ops will be the same as on TPU but transpiled to the correct target? but you dont need 2 operations that handles specific GPU or TPU case? only 1 lowered op to target GPU or TPU.

tyoc213 commented 3 years ago

Hi there people, just leting you now I wrote a condensed post about how to compile locally https://tyoc213.github.io/blog/xla/fastai/2020/11/28/compiling-xla-locally.html

By the way, dont know why on docker build the h files are not copied over I just found by trial an error but guess just copying all will do.

Also link the last missing link https://github.com/pytorch/pytorch/issues/31943#issuecomment-637770008

JackCaoG commented 3 years ago

@tyoc213 Sorry I missed the email, nice blog post! Glad to see it is working for you.

tyoc213 commented 3 years ago

Well, @JackCaoG I will love to try to test it on the "next level" but I dont make a promise, but if I where to take a look at a lowering op, which one you think would be the "easiest" one to go?

JackCaoG commented 3 years ago

@tyoc213 That sounds good! Looking at current opening op lowering request, I would probably pick one of ctc_loss, grid_sampler_2d and grid_sampler_3d. I didn't spend too much time looking into how to lower them but I don't think any of them is very obvious (If it only takes 2 hours I would have done it already 😄 ). You can take a look at https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md and ask questions if there is anything missing from this documentation. I am happy to help.

tyoc213 commented 3 years ago

@JackCaoG Well, both of them seem more complex than the simple example :) let me take more closer look, but for the moment:

I think it is not clear if gen.py should be modified because the 2 examples provided modify gen.py. The names for example of ctc_loss, grid_sampler_2d and grid_sampler_3d are on the aten_xla_type_default I think the link of "PyTorch tensor api" links to https://pytorch.org/docs/stable/index.html, so it seems that I should just copy paste the signature of any of them to have the entry point in aten_xla_type. Then it looks that at::Tensor -> XLATensor and this -> generate IR -> ir::ops that have type (or will generate) a sequence of XlaOps. I think I need to look more closely to this because I thought that writing XLA tensor ops will be enought to implement something new.

Also found this on pytorch

- name: aten::_ctc_loss
  depends: aten::empty, aten::eq, aten::fill_, aten::is_nonzero, aten::narrow, aten::permute, aten::size, aten::stride
- name: aten::_ctc_loss_backward
  depends: aten::empty_like, aten::eq, aten::fill_, aten::full_like, aten::is_nonzero, aten::narrow, aten::permute, aten::size, aten::stride, aten::zero_

- name: aten::grid_sampler
  depends: aten::cudnn_grid_sampler, aten::eq, aten::grid_sampler_2d, aten::grid_sampler_3d, aten::is_nonzero, aten::size, aten::stride
- name: aten::grid_sampler_2d
  depends: aten::empty, aten::eq, aten::is_nonzero, aten::size, aten::stride
- name: aten::grid_sampler_2d_backward
  depends: aten::contiguous, aten::empty_like, aten::eq, aten::is_nonzero, aten::size, aten::stride, aten::zero_, aten::zeros_like
- name: aten::grid_sampler_3d
  depends: aten::empty, aten::eq, aten::is_nonzero, aten::size, aten::stride
- name: aten::grid_sampler_3d_backward
  depends: aten::empty_like, aten::eq, aten::is_nonzero, aten::size, aten::stride, aten::zero_, aten::zeros_like

But dont know if that helps in something.

Q: All ops in https://www.tensorflow.org/xla/operation_semantics maps to https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/client/xla_builder.h ? but I see some functions are not documented, only the XlaOp are documented?

Q: It seem that for ctc_loss is apply an algorithm while the samplers are juggle with accessing maybe slices and interpolation? guess need to read more, and yeah not very obvious, the only I can think for the moment is that maybe I will need slices? for samplers and some random distrib for ctc, but not sure. Probably for the time I get it you will be ready to implement them, but time will say. Would say that indexes look more easy.. but right now I wonder.

Q: Is OK if I continue asking here about this subject? should I reopen this issue?

Q: Also feel like I will start finding bugs but probably will be ones that introduced by my... should I report? like #2670

Or maybe like this


device0 = xm.xla_device(devkind='CPU')
device1 = xm.xla_device(devkind='GPU')
print('CPU', device0)
print('GPU', device1)

got

CPU xla:0
GPU xla:1

But trying to make do a move over a move? (well, doing some test and probably doing something wrong just trying to see something) I got this

Traceback (most recent call last):
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tyoc213/.vscode-oss/extensions/ms-python.python-2020.8.105369/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/tyoc213/.vscode-oss/extensions/ms-python.python-2020.8.105369/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/tyoc213/.vscode-oss/extensions/ms-python.python-2020.8.105369/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tyoc213/Documents/github/pytorch/xla/x_to_xla/to_xla.py", line 289, in <module>
    l(0, {})
  File "/home/tyoc213/Documents/github/pytorch/xla/x_to_xla/to_xla.py", line 173, in l
    tpu_learner.fit(1) #, cbs=[check()])
  File "/home/tyoc213/Documents/github/fastai/fastai/learner.py", line 201, in fit
    if reset_opt or not self.opt: self.create_opt()
  File "/home/tyoc213/Documents/github/fastai_xla_extensions/fastai_xla_extensions/core.py", line 181, in create_opt
    self.move2_xla_device()
  File "/home/tyoc213/Documents/github/fastai_xla_extensions/fastai_xla_extensions/core.py", line 152, in move2_xla_device
    self.model.to(xla_model_device)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 629, in to
    return self._apply(convert)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 359, in _apply
    module._apply(fn)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 381, in _apply
    param_applied = fn(param)
  File "/home/tyoc213/miniconda3/envs/xla/lib/python3.6/site-packages/torch/nn/modules/module.py", line 627, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:382 : Check failed: session->session()->Run( session_work->feed_inputs, session_work->outputs_handles, &outputs) == ::tensorflow::Status::OK() (Invalid argument: Cannot assign a device for operation XRTAllocateFromTensor_1: {{node XRTAllocateFromTensor_1}} was explicitly assigned to /job:localservice/replica:0/task:0/device:XLA_GPU:0 but available devices are [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
         [[XRTAllocateFromTensor_1]] vs. OK)
*** Begin stack trace ***
        tensorflow::CurrentStackTrace()

        xla::util::MultiWait::Complete(std::function<void ()> const&)

        clone
*** End stack trace ***

Which suguest that there are no xla GPU devkind? and that the two xla devices are CPUs [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ], but I think the first example just showed that I can get correct devkind.

JackCaoG commented 3 years ago

Hi @tyoc213, I agree that we should probably start with something simple. Let me check with @ailzhang and we should be able to find a easier op for you to start and experiment.

I think we only document XLAOP, but you can call helper functions defined in that XlaBuilder if they are useful. Sorry I missed your other issue, let me try to reply to that issue directly.

tyoc213 commented 3 years ago

@JackCaoG well, will give a second shot at understanding for the 2 ops suguested above, which functions are most related to look at them? or if there are which XLA ops should I look to be taked as base or related impl?

I also see that there is GenericSlice which I guess will work for to "slice" "grid samples" (will need to look more on this, just current "hipotesis"), but have been wondering if it is not needed to first have the affine_grid or meshgrid? (I cant seem to find the impl for XLA in aten_xla_type_default.h).

I already have read them and I think I "get them" and also have found this: https://discuss.pytorch.org/t/how-does-grid-sample-x-grid-work/15401/4 so I will give another go for the samplers :).