Closed henrykironde closed 4 months ago
Some more error report from terminal, the above was from Rstudio
> model$trainer$fit(model)
| Name | Type | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M Trainable params
222 K Non-trainable params
32.1 M Total params
128.592 Total estimated model params size (MB)
/Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:327: UserWarning: The number of training samples (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
/Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:382: UserWarning: One of given dataloaders is None and it will be skipped.
rank_zero_warn("One of given dataloaders is None and it will be skipped.")
Epoch 0: 0%| | 0/1 [00:00<00:00, 4782.56it/s][W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
zsh: abort R
Looks like there a crash on binary libiomp
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initializ
Some reference :
1) https://github.com/dmlc/xgboost/issues/1715
2) https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial
Worked for me after setting Sys.setenv("KMP_DUPLICATE_LIB_OK"="TRUE")
.
We have to be careful since libiomp5.dylib
vs libomp.dylib
may give us different results
same OMP issue on W10 when running model = df_model()
:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
What do you suggest "[The best thing to do is ] to ensure that only a single OpenMP runtime is linked into the process"?
Your solution using Sys.setenv("KMP_DUPLICATE_LIB_OK"="TRUE")
seems "risky" for the actual use in a production environment (having no idea if and when it may cause issues).
Thanks in advance
I've now fixed the OMP issue via a change in the installation instructions that removes the mkl package which is causing this issue e10c158cfa846b2b683d8595336eb04219c9960b
Can someone using macOS follow the new installation instructions and see if the rest of the issues reported here remain? I'm still seeing training issues on Windows, but things now work properly for predicting from the release model
Using macOS, I ran into the following issues during installation:
reticulate::conda_remove('r-reticulate', packages = 'mkl')
returned the following:"+ '~/Library/r-miniconda/bin/conda' 'remove' '--yes' '--name' 'r-reticulate' 'mkl' Collecting package metadata (repodata.json): ...working... done Solving environment: ...working... failed
PackagesNotFoundError: The following packages are missing from the target environment:
Error: Error 1 occurred removing conda environment r-reticulate"
After this error, I continued with installation anyways...
I had to run install.packages('devtools')
(not included in the installation code) before running devtools::install_github('weecology/deepforestr')
It seems that any code including the df_model()
function crashes RStudio. Examples that have caused a crash:
model <- df_model()
deepforestr::df_model()
Thanks for the report @mirandateats! Unfortunately we've had ongoing stability issues with reticulate (which is how we run the core Python package from within R) on non-Linux systems. We'll keep trying to address those issues, but at the moment my recommendation is to do the core DeepForest work using the Python package directly and then import the results to R for further analysis and visualization.
@mirandateats - it looks like some of the upstream issues have been resolved now and I have things running properly on Windows 10. Can you try a fresh install and let me know if you're still running into issues?
@spono - after some upstream fixes everything seems to be working on Windows now. Can you try a fresh install and then see if the test code below runs
library(deepforestr)
model = df_model()
model$use_release()
annotations_file = get_data("testfile_deepforest.csv")
model$config$train$csv_file = annotations_file
model$config$train$root_dir = get_data(".")
model$create_trainer()
model$train$fit(model)
@henrykironde - can you test again on macOS since our upstream issues seem to be resolved now (at least on Windows)
@henrykironde and @ethanwhite - I'm curious if you've resolved this issue. I ran into the same problem on macOS yesterday. After a basic install according to the directions on the website, Rstudio crashed when I ran model = df_model()
Thank you.
Thanks for the report @robAndrus34! We haven't managed to reproduce this locally in part due to not having many mac's in the lab. If you have time to work with us on debugging on macOS we'd be happy to do that. If you need to get something up and running quickly then it's pretty easy to do in Python even if you don't much Python work. Let us know which direction you'd like to go and we'll be happy to help.
Thanks @ethanwhite . I decided to go the Python route for now. At some future date, I may be interested in troubleshooting the R issue. Thanks
Sounds good @robAndrus34 - let us know if you have any questions as you get things up and running in Python
This failure is now reflected in our failing macOS tests which may help us explore this further.
Tests are now passing for macOS on non-M1 chips and everything is working on local tests on Linux and Windows including RStudio. Therefore I'm going to go ahead and close this issue. Please open a new issue with detailed information if you have issues.