Closed turgut090 closed 3 years ago
It would be helpful if you could share the error message you're seeing (or better yet a reproducible example).
This happens only with dev version of reticulate
Error:
>>> learn.fit_one_cycle(2)
epoch train_loss valid_loss accuracy time
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
RuntimeError: DataLoader worker (pid(s) 4594, 4595) exited unexpectedly
Second time running:
learn.fit_one_cycle(2)
epoch train_loss valid_loss accuracy time
Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd8a13a76a0>>
Traceback (most recent call last):
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1101, in __del__
self._shutdown_workers()
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1075, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/multiprocessing/process.py", line 122, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd8a13a76a0>>
Traceback (most recent call last):
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1101, in __del__
self._shutdown_workers()
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1075, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/multiprocessing/process.py", line 122, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd8a13a76a0>>
Traceback (most recent call last):
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1101, in __del__
self._shutdown_workers()
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1075, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/multiprocessing/process.py", line 122, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Exception ignored in: <bound method _MultiProcessingDataLoaderIter.__del__ of <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7fd8a13a76a0>>
Traceback (most recent call last):
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1101, in __del__
self._shutdown_workers()
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1075, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/multiprocessing/process.py", line 122, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
RuntimeError: DataLoader worker (pid(s) 4644, 4647) exited unexpectedly
The crash is happening in Python code, so it seems unlikely to be related to the version of reticulate
in use.
This is the first thing I thought about. However, it is still unclear why it has to fail with the dev version of reticulate and work with the cran version.
Are you able to confirm that the same version of Python is being used in each case as well?
Absolutely. I just have r-miniconda
and one environment /home/turgut/.local/share/r-miniconda/envs/r-reticulate/
.
The only thing that I change is:
devtools::install_github("rstudio/reticulate") # fails
and:
install.packages("reticulate") # success
Hi, Kevin. Is there any update?
Sorry, no. Can you share a standalone reproducible example (something I could easily copy + run to reproduce the issue locally)?
This is what I run. However, PyTorch with Cuda has to be installed to reproduce this error. It fails when it tries to run the task on multiple workers with GPU for example for image classification.
# pkgs
from fastai.vision.all import *
from fastai.medical.imaging import *
import pydicom
import pandas as pd
# code
pneumothorax_source = untar_data(URLs.SIIM_SMALL)
df = pd.read_csv(f"{pneumothorax_source}/labels.csv")
pneumothorax = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
get_x=lambda x: f"{pneumothorax_source}/{x[0]}",
get_y=lambda x: x[1],
batch_tfms=aug_transforms(size=224))
dls = pneumothorax.dataloaders(df.values)
learn = cnn_learner(dls, resnet34, metrics=accuracy)
learn.fit_one_cycle(2)
Did you manage to reproduce?
Please, see my RMarkdown outputs with CRAN and Github version:
reticulate
Turgut
10/7/2020
knitr::opts_chunk$set(echo = TRUE)
#reticulate::py_config()
CUDA
system('nvidia-smi',intern = T)
## [1] "Wed Oct 7 22:52:54 2020 "
## [2] "+-----------------------------------------------------------------------------+"
## [3] "| NVIDIA-SMI 450.36.06 Driver Version: 450.36.06 CUDA Version: 11.0 |"
## [4] "|-------------------------------+----------------------+----------------------+"
## [5] "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |"
## [6] "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |"
## [7] "| | | MIG M. |"
## [8] "|===============================+======================+======================|"
## [9] "| 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A |"
## [10] "| 0% 48C P5 37W / 170W | 222MiB / 5931MiB | 2% Default |"
## [11] "| | | N/A |"
## [12] "+-------------------------------+----------------------+----------------------+"
## [13] " "
## [14] "+-----------------------------------------------------------------------------+"
## [15] "| Processes: |"
## [16] "| GPU GI CI PID Type Process name GPU Memory |"
## [17] "| ID ID Usage |"
## [18] "|=============================================================================|"
## [19] "| 0 N/A N/A 1378 G /usr/lib/xorg/Xorg 83MiB |"
## [20] "| 0 N/A N/A 2142 G compiz 31MiB |"
## [21] "| 0 N/A N/A 3206 G ...AAAAAAAAA= --shared-files 58MiB |"
## [22] "| 0 N/A N/A 9387 G /usr/lib/rstudio/bin/rstudio 45MiB |"
## [23] "+-----------------------------------------------------------------------------+"
reticulate::py_config()
## python: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/bin/python
## libpython: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome: /home/turgut/.local/share/r-miniconda/envs/r-reticulate:/home/turgut/.local/share/r-miniconda/envs/r-reticulate
## version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
## numpy: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version: 1.18.5
Reticulate CRAN
from fastai.basics import *
from fastai.callback.all import *
from fastai.vision.all import *
from fastai.medical.imaging import *
import pydicom
import pandas as pd
pneumothorax_source = untar_data(URLs.SIIM_SMALL)
df = pd.read_csv(f"{pneumothorax_source}/labels.csv")
pneumothorax = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
get_x=lambda x: f"{pneumothorax_source}/{x[0]}",
get_y=lambda x: x[1],
batch_tfms=aug_transforms(size=224))
dls = pneumothorax.dataloaders(df.values)
learn = cnn_learner(dls, resnet34, metrics=accuracy)
learn.fit_one_cycle(2)
## █
epoch train_loss valid_loss accuracy time
## █
█
0 1.316968 0.854507 0.660000 00:02
## █
█
1 1.163454 0.898112 0.620000 00:02
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 lattice_0.20-41 digest_0.6.25 rappdirs_0.3.1
## [5] grid_4.0.2 jsonlite_1.7.1 magrittr_1.5 evaluate_0.14
## [9] rlang_0.4.7 stringi_1.5.3 Matrix_1.2-18 reticulate_1.16
## [13] rmarkdown_2.3 tools_4.0.2 stringr_1.4.0 xfun_0.17
## [17] yaml_2.2.1 compiler_4.0.2 htmltools_0.5.0 knitr_1.30
reticulate::py_config()
## python: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/bin/python
## libpython: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome: /home/turgut/.local/share/r-miniconda/envs/r-reticulate:/home/turgut/.local/share/r-miniconda/envs/r-reticulate
## version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
## numpy: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version: 1.18.5
reticulate
Turgut
10/7/2020
knitr::opts_chunk$set(echo = TRUE)
#reticulate::py_config()
CUDA
system('nvidia-smi',intern = T)
## [1] "Wed Oct 7 22:59:29 2020 "
## [2] "+-----------------------------------------------------------------------------+"
## [3] "| NVIDIA-SMI 450.36.06 Driver Version: 450.36.06 CUDA Version: 11.0 |"
## [4] "|-------------------------------+----------------------+----------------------+"
## [5] "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |"
## [6] "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |"
## [7] "| | | MIG M. |"
## [8] "|===============================+======================+======================|"
## [9] "| 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A |"
## [10] "| 0% 50C P2 37W / 170W | 2254MiB / 5931MiB | 1% Default |"
## [11] "| | | N/A |"
## [12] "+-------------------------------+----------------------+----------------------+"
## [13] " "
## [14] "+-----------------------------------------------------------------------------+"
## [15] "| Processes: |"
## [16] "| GPU GI CI PID Type Process name GPU Memory |"
## [17] "| ID ID Usage |"
## [18] "|=============================================================================|"
## [19] "| 0 N/A N/A 1378 G /usr/lib/xorg/Xorg 91MiB |"
## [20] "| 0 N/A N/A 2142 G compiz 93MiB |"
## [21] "| 0 N/A N/A 3206 G ...AAAAAAAAA= --shared-files 69MiB |"
## [22] "| 0 N/A N/A 10624 G /usr/lib/rstudio/bin/rstudio 47MiB |"
## [23] "| 0 N/A N/A 10675 C .../lib/rstudio/bin/rsession 973MiB |"
## [24] "+-----------------------------------------------------------------------------+"
reticulate::py_config()
## python: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/bin/python
## libpython: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome: /home/turgut/.local/share/r-miniconda/envs/r-reticulate:/home/turgut/.local/share/r-miniconda/envs/r-reticulate
## version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
## numpy: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version: 1.18.5
Reticulate Github
from fastai.basics import *
from fastai.callback.all import *
from fastai.vision.all import *
from fastai.medical.imaging import *
import pydicom
import pandas as pd
pneumothorax_source = untar_data(URLs.SIIM_SMALL)
df = pd.read_csv(f"{pneumothorax_source}/labels.csv")
pneumothorax = DataBlock(blocks=(ImageBlock(cls=PILDicom), CategoryBlock),
get_x=lambda x: f"{pneumothorax_source}/{x[0]}",
get_y=lambda x: x[1],
batch_tfms=aug_transforms(size=224))
dls = pneumothorax.dataloaders(df.values)
learn = cnn_learner(dls, resnet34, metrics=accuracy)
learn.fit_one_cycle(2)
## Error in py_call_impl(callable, dots$args, dots$keywords): RuntimeError: DataLoader worker (pid(s) 11059) exited unexpectedly
learn.fit_one_cycle(2)
## Error in py_call_impl(callable, dots$args, dots$keywords): RuntimeError: DataLoader worker (pid(s) 11092, 11093, 11096) exited unexpectedly
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 lattice_0.20-41 digest_0.6.25 rappdirs_0.3.1 grid_4.0.2
## [6] jsonlite_1.7.1 magrittr_1.5 evaluate_0.14 rlang_0.4.7 stringi_1.5.3
## [11] Matrix_1.2-18 reticulate_1.16-9001 rmarkdown_2.3 tools_4.0.2 stringr_1.4.0
## [16] xfun_0.17 yaml_2.2.1 compiler_4.0.2 htmltools_0.5.0 knitr_1.30
reticulate::py_config()
## python: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/bin/python
## libpython: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome: /home/turgut/.local/share/r-miniconda/envs/r-reticulate:/home/turgut/.local/share/r-miniconda/envs/r-reticulate
## version: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
## numpy: /home/turgut/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version: 1.18.5
@kevinushey Could you share your opinion about this issue, please?
Sorry, I haven't yet had time to investigate (since IIUC reproducing this will require me to do some extra setup to get CUDA working.)
If you were feeling brave, you could try performing a git bisect to see where the problematic behavior from reticulate
was introduced.
@kevinushey I found a commit that can help:
This selected works fine:
However, starting from this point I faced that issue:
Later, these commits could not load the fastai
submodules properly.
# from this it throws Error in vision$gan$ImageBlock(...) : attempt to apply non-function
devtools::install_github("rstudio/reticulate", ref = "f5f8d465")
# from this it throws Error in vision$gan$ImageBlock(...) : attempt to apply non-function
devtools::install_github("rstudio/reticulate", ref = "f3bd8a33")
# from this it throws Error in vision$gan$ImageBlock(...) : attempt to apply non-function
devtools::install_github("rstudio/reticulate", ref = "0292fa29")
# from this it throws Error in vision$gan$ImageBlock(...) : attempt to apply non-function
devtools::install_github("rstudio/reticulate", ref = "4e11cdc5")
# from this it throws Error in vision$gan$ImageBlock(...) : attempt to apply non-function
devtools::install_github("rstudio/reticulate", ref = "dd93bd77")
# from this it throws Error in vision$gan$ImageBlock(...) : attempt to apply non-function
devtools::install_github("rstudio/reticulate", ref = "afc475e8")
But later you fixed the last problem (Error in vision$gan$ImageBlock(...) : attempt to apply non-function
) but the data loader worker issue is still there.
Thanks -- that's helpful. However, we no longer use importhook
to hook module imports. We instead do this by just overriding the __import__
builtin, which nonetheless may also be the cause of these issues.
What do you see with this commit?
https://github.com/rstudio/reticulate/commit/f1329c8de2a8b1fbb0bcc88eb767fde4f2c5accf
Do you see the same issue, or something else?
What do you see with this commit?
Do you see the same issue, or something else?
Yes, same. Error in py_call_impl(callable, dots$args, dots$keywords) : RuntimeError: DataLoader worker (pid(s) 4071, 4072, 4073) exited unexpectedly
Thank you! I really appreciate your taking the time to test and run this down further. Can you help me test one more time, with the development version of reticulate
? Install it with:
remotes::install_github("rstudio/reticulate")
Then, please try the following in a new R session:
# ensure this is run before reticulate is loaded
options(reticulate.useImportHook = FALSE)
# load reticulate and run example
library(reticulate)
< ... >
With this, reticulate
will no longer attempt to inject its own __import__
hook -- does that make a difference?
> # ensure this is run before reticulate is loaded
> options(reticulate.useImportHook = FALSE)
Yes, with this it has just worked!
That's great news! Although it's unfortunate since it means we'll have to consider another separate way of tracking Python packages as they are imported...
Should we always set it in options or will there be a PR?
For now please keep it set; I'll have to see if there's something else I can do.
Hi @kevinushey . Your fix helped me! :) https://github.com/rstudio/reticulate/issues/885
Now, I do not have to run this: options(reticulate.useImportHook = FALSE)
That's fantastic! I'm glad to hear it.
I am closing this issue, thanks!
Hi, Kevin. I installed the dev version of reticulate and it fails while using GPU/switching to GPU. Something wrong. The CRAN version works fine. I am running this notebook https://github.com/fastai/fastai/blob/master/nbs/61_tutorial.medical_imaging.ipynb on Ubuntu 16 with Cuda 10.1. It fails both from python and r side.