mlverse / torch

R Interface to Torch
https://torch.mlverse.org
Other
504 stars 69 forks source link

Problems with liblantern on ARM (Raspberry Pi, Jetson Nano) #262

Open bearloga opened 4 years ago

bearloga commented 4 years ago

Hi! First of all, huge congratulations on the first release! Very exciting!!!

This isn't a critical issue since most (if not all) people are going to be using this library on their macOS, Windows, and Linux x86 PCs. I encountered it when I went "hm, I wonder if it will work" and I suspect the reason it didn't is that there aren't any official ARMv7 or ARMv8 builds of Torch. So there's not much for the maintainers to do – I'm just documenting the errors I got trying to install/use on the micro-PCs Raspberry Pi 4 & NVIDIA Jetson Nano (which has a GPU) in case anyone else tries and wonders why it's not working.

On Raspberry Pi 4 Model B (with Raspberry Pi OS, based on Debian Buster):

R> install.packages("torch")
...
installing to /home/mikhail/R/arm-unknown-linux-gnueabihf-library/3.5/torch/libs
...
* DONE (torch)

R> torch::install_torch()
trying URL 'https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.5.0%2Bcpu.zip'
...
trying URL 'https://storage.googleapis.com/torch-lantern-builds/refs/heads/cran/v0.1.0/latest/Linux-cpu.zip'
...
Error in cpp_lantern_init(normalizePath(install_path())) : 
  /home/mikhail/R/arm-unknown-linux-gnueabihf-library/3.5/torch/deps/liblantern.so -
  /home/mikhail/R/arm-unknown-linux-gnueabihf-library/3.5/torch/deps/liblantern.so: wrong ELF class: ELFCLASS64

On Jetson Nano Developer Kit (Ubuntu 18.04.5 LTS; Jet Pack 4.4 with CUDA 10.2 support):

R> install.packages("torch")
...
installing to /home/mikhail/R/aarch64-unknown-linux-gnu-library/3.4/torch/libs
...
* DONE (torch)

R> torch::install_torch()
trying URL 'https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.5.0%2Bcu102.zip'
...
trying URL 'https://storage.googleapis.com/torch-lantern-builds/refs/heads/cran/v0.1.0/latest/Linux-gpu-102.zip'
...
Error in cpp_lantern_init(normalizePath(install_path())) : 
  /home/mikhail/R/aarch64-unknown-linux-gnu-library/3.4/torch/deps/liblantern.so -
  /home/mikhail/R/aarch64-unknown-linux-gnu-library/3.4/torch/deps/liblantern.so: cannot open shared object file: No such file or directory

NVIDIA provides CUDA-enabled PyTorch 1.6 wheels and here's a test:

$> python3
>>> import torch
>>> torch.cuda.current_device()
0
>>> torch.cuda.get_device_name(0)
'NVIDIA Tegra X1'
dfalbel commented 4 years ago

Hi @bearloga! thanks very much!

Interesting!!

It seems that it's possible to compile LibTorch on raspberry pi (eg: https://stackoverflow.com/questions/62755739/libtorch-on-raspberry-cant-load-pt-file-but-working-on-ubuntu). We could then, in theory, create a case for it here: https://github.com/mlverse/torch/blob/master/lantern/CMakeLists.txt#L33-L57 so we could build lantern in Raspberry Pi OS. Sounds like a nice project, but expecting 6h compilation times :)

jonthegeek commented 4 years ago

Well, I didn't care about this, but now I definitely do! Realistically, though, I can imagine running something like https://twitter.com/RBERTbot on my Raspberry Pi, so I can definitely imagine wanting to at least predict on there.

Following, and maybe I'll find time to play around to see if I can get it working.

znmeb commented 4 years ago

I've got a Jetson Nano and a Jetson AGX Xavier and want to run torch. Can the installer simply compile liblantern.so from source as well as libtorch itself?

See issue https://github.com/znmeb/edgyR/issues/32

dfalbel commented 4 years ago

Torch for R can be built from source with https://github.com/mlverse/torch/blob/master/tools/buildlantern.R and then devtools::install. However we assume a pre-built LibTorch binary, which is currently not provided by the PyTorch team for ARM cpu's.

We would need to compile LibTorch- in theory we could use: https://github.com/pytorch/pytorch/blob/master/tools/build_libtorch.py

znmeb commented 4 years ago

Torch for R can be built from source with https://github.com/mlverse/torch/blob/master/tools/buildlantern.R and then devtools::install. However we assume a pre-built LibTorch binary, which is currently not provided by the PyTorch team for ARM cpu's.

We would need to compile LibTorch- in theory we could use: https://github.com/pytorch/pytorch/blob/master/tools/build_libtorch.py

Thanks! I'm doing everything in a Docker container (Jetson-specific) which already has the NVIDIA PyTorch, torchvision and torchaudio installed. So I'll need a way to link with those binaries / headers if possible rather than building LibTorch. I'll be trying that this afternoon (about 15:00 Pacific Daylight Time). It'll be in a branch of https://github.com/znmeb/edgyR if I can get it working.

znmeb commented 4 years ago

libtorch.so is indeed present on a Jetson if you have the NVIDIA PyTorch build installed. liblantern is not:

root@edgyr:/usr/local/src# locate libtorch
/usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so
/usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so
/usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so
/usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_global_deps.so
/usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so
root@edgyr:/usr/local/src# locate lantern

I don't have a Raspberry Pi so I'm no help there, but if I can build liblantern against the libtorch on the Jetson, I should be able to make things work.

For other Jetson people - here's the NVIDIA PyTorch install docs: https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-6-0-now-available/72048. I'm using a Docker base image with PyTorch on it so I don't need to install it.

znmeb commented 4 years ago

One other NVIDIA Jetson note: The Jetson operating system is Linux for Tegra (L4T), which is Ubuntu 18.04 LTS "Bionic Beaver" for arm64 with all most of the NVIDIA tooling installed. I tried a build of liblantern on it and the native CMake is too old. And there are no ARM binaries for the later versions of CMake in the upstream repository - I had to build it from source.

This isn't a show stopper for me; I'm doing my own releases and everything is done in Docker containers. But it might be a problem for CI setups.

znmeb commented 2 years ago

Revisiting this - project is https://github.com/AlgoCompSynth/AlgoCompSynth-One. I have a setup for building PyTorch from source now on Jetsons. It takes a few hours on my fastest machine but the wheels work. Also, the Jetson Xavier NX and AGX Xavier can run JetPack 5.0 Developer Preview, which is built on Ubuntu 20.04 LTS and Python 3.8. Quite a few Python projects seem to have abandoned support for Python 3.6, and I may do the same. :-)