Open gilbertocamara opened 5 months ago
Can you show me the output of torch::install_torch(reinstall = TRUE)
? Also, I'assuming it doesnt fail if you run eg:
torch_randn(10)`?
Sure!
torch::install_torch(reinstall = TRUE)
trying URL 'https://github.com/mlverse/libtorch-mac-m1/releases/download/LibTorch-for-R/libtorch-v2.0.1.zip'
Content type 'application/octet-stream' length 49631992 bytes (47.3 MB)
==================================================
downloaded 47.3 MB
trying URL 'https://torch-cdn.mlverse.org/binaries/refs/heads/main/latest/lantern-0.12.0.9000+cpu+arm64-Darwin.zip'
Content type 'application/zip' length 3602457 bytes (3.4 MB)
==================================================
downloaded 3.4 MB
✔ torch dependencies have been installed.
ℹ You must restart your session to use torch correctly.
Running a simple command such as torch_randn(10)
works.
torch::torch_randn(10)
torch_tensor
0.8753
0.9061
-1.8905
-0.2683
-0.4204
-0.3306
1.1119
0.0052
0.3246
-0.2530
[ CPUFloatType{10} ]
torch
also can access the M3 MPS. The following works.
x <- torch::torch_randn(10, 10)$to(device="mps")
y <- torch::torch_randn(10, 10)$to(device="mps")
torch::torch_mm(x, y)
The problems appear on the luz::fit()
function. We compiled the lantern
library from source, and tried to install it as follows.
# compiled lantern from source and configured env variables as follows
devtools::install(build = FALSE)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
/Users/gilberto/torch --install-tests
* installing to library ‘/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library’
* installing *source* package ‘torch’ ...
** using staged installation
CMAKE_FLAGS:
** libs
con compilatore C++: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
con SDK: ‘MacOSX14.4.sdk’
*** Building lantern!
mkdir -p ../build-lantern
cd ../build-lantern && cmake ../src/lantern -DCMAKE_INSTALL_PREFIX=/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/00LOCK-torch/00new/torch -DCMAKE_INSTALL_MESSAGE="LAZY" && cmake --build . --target install --config Release
### Lots of output...
-- Build files have been written to: /Users/gilberto/torch/build-lantern
## We then configured the env variables
Sys.setenv(LANTERN_URL="/Users/gilberto/torch/build-lantern")
Sys.setenv(TORCH_URL="/Users/gilberto/torch/build-lantern/libtorch")
## We then tried to install torch after this, but if falis
Either there is a problem with the lantern
code when using M3, or we have failed to install correctly after compiling from source.
You might want to try setting the env var BUILD_LANTERN=1
then running remotes::install_github("mlverse/torch")
to build lantern from source. Although, I don't think lantern is the culprit here, as it's just a relatively thin wrapper around LibTorch. You might also need to build LibTorch from source.
Also, have you tried installing pre-built binaries from with eg:
kind <- "cpu"
version <- "0.12.0.9000"
options(repos = c(
torch = sprintf("https://torch-cdn.mlverse.org/packages/%s/%s/", kind, version),
CRAN = "https://cloud.r-project.org" # or any other from which you want to install the other R dependencies.
))
install.packages("torch", type = "binary")
Thanks! I have tried, but failed.
Can you also try disabling MPS on luz, just so we can narrow a little more the problem.
You can do something like:
torch_model <- luz::fit(
torch_model,
data = list(train_x, train_y),
epochs = 100,
valid_data = list(test_x, test_y),
callbacks = list(luz::luz_callback_early_stopping(
patience = 20,
min_delta = 0.01
)),
verbose = TRUE,
accelerator = accelerator(cpu = TRUE)
)
Works!!! Can we now make luz
work on MPS?
I think we will need to figure out why torch fails on M3 + MPS for that model. I believe it's possible that you will need to build LibTorch from source to fix this issue.
How do I build libtorch and liblantern from source?
To build LibTorch from source, you can follow instructions the steps in this workflow file:
https://github.com/mlverse/libtorch-mac-m1/blob/main/.github/workflows/libtorch.yaml
Then copy the libtorch files into src/lantern/build and run load_all or dev tools::install with BUILD_LANTERN=1 set.
Thanks!! I will try
Dear @dfalbel we tried to build torch from source, but it did not work on Mac M3 chip. Looking at the pytorch github, other developers are having similar problems with the new M3 chip. Please see the following issue:
Hello. I had a similar issue, but after I upgraded to macOS Sonoma 14.4.1 on a Mac M2. I posted on the Luz GitHub, but was happy to see some discussion here.
Dear @dfalbel I have bought a new MacBook Air with the M3 chip which has 8 CPUs, 10 GPUs and 16GB integrated memory. My R
torch
apps are crashing. I have put together a MWE which works on all other architectures, including in MacBook Air M1 and MacMini. The OS is the same (Sonoma 14.5). The MWE follows:The error occurs in the
luz::fit
function. Inside RStudio, the code gets stuck and then RStudio asks to restart R. When running R from the terminal, the output is:The
sessionInfo()
output is as follows: