Open vincenzzimmer opened 4 years ago
@vincenzzimmer It might be worthwhile to try changing your temp directory location and see whether that helps (i.e., something like https://www.howtogeek.com/285710/how-to-move-windows-temporary-folders-to-another-drive). I don't have much experience on Windows though, so for now that's all I can think of. Also, you can check the permission attributes of your current temp directory to see if any read/write permission is different from normal.
alternatively can also try having the following
TMPDIR=<writable location>
TMP=<writable location>
TEMP=<writable location>
in your Renviron.site
config file and restart RStudio for the change to take effect
Meanwhile because I have never seen issues like this on Mac OS or Linux, I'm suspecting it's an OS-specific problem.
Thank you for the quick response. I have set the TMP and TEMP environment variables to another user-independent folder (D:\temp) and gave my user and the user running the spark services all available permissions. The error message is still the same.
I looked for the log-file at the location shown in the message and it was there. I could not open it, while RStudio was running. After closing RStudio I could open it without any issues. Maybe the problem is not a permission issue but an issue due to the log being blocked by a process accessing it in parallel.
I am sure that this is a OS specific issue. Unfortunately, I have no Linux OS available, yet.
@vincenzzimmer Wow thanks for posting that error msg :) I never realized 2 processes cannot access the log file at the same time on windows... even though 1 of them was read-only (i.e., file(con, "r")
etc)
So yeah... That at least gives me more clue on how to possibly workaround this type of problem.
@yl790 Thank you for your support. Please tell me, if I can help with more information or run some tests.
The problem still persists. I face it using Windows 10, R 3.6.3, RStudio 1.3.1093, sparklyr 1.4.0. I noticed, that the *_spark.log file gets created without problems in a folder where I have read/write rights. But as soon as sparklyr/spark_connect tries to open this file, I get the permission denied error. I can also confirm what vincenzzimmer wrote, that I can delete the files as soon as I close RStudio. On my linux system (Ubuntu), it works completetly fine.
I just tried it with sparklyr version 1.2.0 and RStudio 1.2.5033 and it now works again.
@not4everybody I think there is definitely some weird OS-specific race condition happening.
Both sparklyr and Apache Spark don't implement anything OS-specific when it comes to creating _spark.log and redirecting Spark log entries to the file, as far as I can tell, so the fact that this is only happening on Windows really puzzles me. Also, there is no change on how the _spark.log file is created. It has been the same implementation from 1.2 to sparklyr 1.4.
I'll let you know if I find some possible fix or workaround for this problem.
I have a same error. any update? thanks. More information: 22/11/08 16:08:06 ERROR sparklyr: Gateway (5142) failed calling getOrCreate on 8: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x67c33749) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x67c33749
I have the same error:
connection <- sparklyr::spark_connect(master = "spark://<ip address>:<port number>",
+ spark_home = "C:/Users/admin_mschuemi/AppData/Local/spark/spark-2.4.3-bin-hadoop2.7")
# Error in file(con, "r") : cannot open the connection
# In addition: Warning message:
# In file(con, "r") :
# cannot open file 'D:/temp/Rtemp\Rtmp0MzxYf\file1994509d3791_spark.log': Permission denied
I've changed the temp folder settings as proposed, which has no effect. I definitely have write access to that location.
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sparklyr_1.7.8
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 magrittr_2.0.3 tidyselect_1.1.2 R6_2.5.1 rlang_1.0.2 fastmap_1.1.0 fansi_1.0.3
[8] httr_1.4.3 dplyr_1.0.9 tools_4.1.1 parallel_4.1.1 config_0.3.1 utf8_1.2.2 cli_3.3.0
[15] DBI_1.1.2 withr_2.5.0 dbplyr_2.2.1 askpass_1.1 htmltools_0.5.2 ellipsis_0.3.2 openssl_2.0.2
[22] yaml_2.3.5 assertthat_0.2.1 digest_0.6.29 rprojroot_2.0.3 tibble_3.1.7 lifecycle_1.0.1 forge_0.2.0
[29] crayon_1.5.1 tidyr_1.2.0 purrr_0.3.4 base64enc_0.1-3 htmlwidgets_1.5.4 vctrs_0.4.1 glue_1.6.2
[36] compiler_4.1.1 pillar_1.7.0 r2d3_0.2.6 generics_0.1.2 jsonlite_1.8.0 pkgconfig_2.0.3
I have the same error:
connection <- sparklyr::spark_connect(master = "spark://<ip address>:<port number>", + spark_home = "C:/Users/admin_mschuemi/AppData/Local/spark/spark-2.4.3-bin-hadoop2.7") # Error in file(con, "r") : cannot open the connection # In addition: Warning message: # In file(con, "r") : # cannot open file 'D:/temp/Rtemp\Rtmp0MzxYf\file1994509d3791_spark.log': Permission denied
I've changed the temp folder settings as proposed, which has no effect. I definitely have write access to that location.
> sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server x64 (build 14393) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sparklyr_1.7.8 loaded via a namespace (and not attached): [1] rstudioapi_0.13 magrittr_2.0.3 tidyselect_1.1.2 R6_2.5.1 rlang_1.0.2 fastmap_1.1.0 fansi_1.0.3 [8] httr_1.4.3 dplyr_1.0.9 tools_4.1.1 parallel_4.1.1 config_0.3.1 utf8_1.2.2 cli_3.3.0 [15] DBI_1.1.2 withr_2.5.0 dbplyr_2.2.1 askpass_1.1 htmltools_0.5.2 ellipsis_0.3.2 openssl_2.0.2 [22] yaml_2.3.5 assertthat_0.2.1 digest_0.6.29 rprojroot_2.0.3 tibble_3.1.7 lifecycle_1.0.1 forge_0.2.0 [29] crayon_1.5.1 tidyr_1.2.0 purrr_0.3.4 base64enc_0.1-3 htmlwidgets_1.5.4 vctrs_0.4.1 glue_1.6.2 [36] compiler_4.1.1 pillar_1.7.0 r2d3_0.2.6 generics_0.1.2 jsonlite_1.8.0 pkgconfig_2.0.3
It seems a known issue on Windows OS [TEMP/TMP folder created by Java as per Stack overflow]. I installed it on Ubuntu [WSL]. It worked well [R/Python/SQL].
When I try connecting to Spark master (standalone) via
sc <- spark_connect(master = "spark://myip:7077")
I get the following error:
_Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : cannot open file 'C:\Users\ADM_DE~2\AppData\Local\Temp\2\RtmpEH4UkI\file1c6c243e351dspark.log': Permission denied
I already read plenty of posts/issues on Gitub and Stack Overflow related to this kind of error messages. I tried to run the code from RStudio, Rterm and with spark-submit. I tried running it as administrator and also installed the latest version of sparklier from Github. I even uninstalled my complete R installation to install everything from scratch (now R 4.0.0; before R 3.6.1). Now I don't know what else I could try and, therefore, am rather convinced that this must be a bug.
Here some information on my environment:
I have a standalone Spark running on a Windows Server 2016 machine with master and worker on the same machine. The Spark version is spark-2.4.4-bin-without-hadoop along with hadoop-2.8.2 (to connect spark with MinIO according to link).
sessionInfo()
R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server x64 (build 14393)Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] sparklyr_1.2.0.9000
loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 rstudioapi_0.11 magrittr_1.5 tidyselect_1.0.0 R6_2.4.1 rlang_0.4.6
[7] httr_1.4.1 dplyr_0.8.5 tools_4.0.0 parallel_4.0.0 config_0.3 DBI_1.1.0
[13] withr_2.2.0 dbplyr_1.4.3 askpass_1.1 htmltools_0.4.0 ellipsis_0.3.0 openssl_1.4.1
[19] yaml_2.2.1 assertthat_0.2.1 rprojroot_1.3-2 digest_0.6.25 tibble_3.0.1 lifecycle_0.2.0
[25] forge_0.2.0 crayon_1.3.4 purrr_0.3.4 base64enc_0.1-3 htmlwidgets_1.5.1 vctrs_0.2.4
[31] glue_1.4.0 compiler_4.0.0 pillar_1.4.4 generics_0.0.2 r2d3_0.2.3 backports_1.1.6
[37] jsonlite_1.6.1 pkgconfig_2.0.3