prioritizr / wdpar

Interface to the World Database on Protected Areas
https://prioritizr.github.io/wdpar
GNU General Public License v3.0
37 stars 5 forks source link

Failed to connect to localhost #7

Closed jcvdav closed 5 years ago

jcvdav commented 5 years ago

Thanks for such a great contribution. This will make using WDPA much more easier. (I love wdpa_clean()!)

I am trying to replicate the process on the README, and am getting some errors:

> library(wdpar)
Loading required package: sf
Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
> mlt_raw_pa_data <- wdpa_fetch("Malta", wait = TRUE)
Error in checkError(res) : 
  Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused

Looking at the code, it seems like the problem may be in wdpa_url()? Line 48 calls rd <- RSelenium::remoteDriver(port = 4567L, browserName = "phantomjs"). If I execute that line and then try rd$open(silent = TRUE) I get the same error message as above.

Running pingr::is_online() returns TRUE.


This is my session info

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Mexico.1252 
[2] LC_CTYPE=Spanish_Mexico.1252   
[3] LC_MONETARY=Spanish_Mexico.1252
[4] LC_NUMERIC=C                   
[5] LC_TIME=Spanish_Mexico.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] wdpar_0.0.1 sf_0.7-2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        magrittr_1.5     
 [3] units_0.6-2       rappdirs_0.3.1   
 [5] RSelenium_1.7.5   R6_2.3.0         
 [7] httr_1.4.0        caTools_1.17.1.1 
 [9] tools_3.5.2       grid_3.5.2       
[11] packrat_0.4.9-2   binman_0.1.1     
[13] e1071_1.7-0       DBI_1.0.0        
[15] semver_0.2.0      subprocess_0.8.3 
[17] class_7.3-14      openssl_1.1      
[19] yaml_2.2.0        assertthat_0.2.0 
[21] countrycode_1.1.0 bitops_1.0-6     
[23] curl_3.3          wdman_0.2.4      
[25] compiler_3.5.2    pingr_1.1.2      
[27] classInt_0.3-1    XML_3.98-1.16    
[29] jsonlite_1.6     
jeffreyhanson commented 5 years ago

Thank you! Could you please verify that Selenium and PhantomJS are installed (or try reinstalling them) using the wdman package? The wdman vignette has more information, but you might be able to do with this the following code:

# Selenium
selServ <- selenium(verbose = TRUE)
selServ$stop()

# PhantomJS
pjsDrv <- phantomjs(verbose = FALSE, check = FALSE)
pjsDrv$log()
pjsDrv$stop()

Could you also please verify if Java is installed on your system? You could do this with the following code:

Sys.which("java")
jcvdav commented 5 years ago

Ah - Selenium and PhantomJS were NOT installed or updated. I re-installed wdman and then ran the suggested code.

Here's what I got.

> install.packages("wdman")
> library(wdman)
> # Selenium
> selServ <- selenium(verbose = TRUE)
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://www.googleapis.com/dow...

Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://www.googleapis.com/dow...

Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://www.googleapis.com/dow...

BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://www.googleapis.com/dow...

Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://www.googleapis.com/dow...

Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://www.googleapis.com/dow...

BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://github.com/mozilla/gec...

Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://github.com/mozilla/gec...

Creating directory: C:\Users\JC\AppData\Local\binm...
Downloading binary: https://github.com/mozilla/gec...

BEGIN: POSTDOWNLOAD
checking phantomjs versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
> selServ$stop()
[1] TRUE
> 
> # PhantomJS
> pjsDrv <- phantomjs(verbose = FALSE, check = FALSE)
> pjsDrv$log()
$stderr
character(0)

$stdout
character(0)

> pjsDrv$stop()
[1] TRUE

And yes, Java is installed:

> Sys.which("java")
                                                      java 
"C:\\PROGRA~2\\COMMON~1\\Oracle\\Java\\javapath\\java.exe" 

However, running wdpa_fetch() still returns the same error message:

> library(wdpar)
Loading required package: sf
Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
> mlt_raw_pa_data <- wdpa_fetch("Malta", wait = TRUE)
Error in checkError(res) : 
  Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused
jeffreyhanson commented 5 years ago

Hmm, could you please try restarting your computer (since sometimes R will leave zombie PhantomJS processes which can cause problems -- unfortunately, I haven't managed to work out how to kill these in a manner that is agnostic to the host operating system), open R with administrator privileges (this might require running R outside of Rstudio), then try running the code below:

library(wdman)
library(RSelenium)
pjs <- wdman::phantomjs(verbose = FALSE)
rd <- RSelenium::remoteDriver(port = 4567L, browserName = "phantomjs")
rd$open(silent = FALSE)
rd$close()
pjs$stop()

I'm sorry you're experiencing these issues. I was playing around on my Windows 7 machine with a fresh R installation and wdpa_fetch was throwing the same error message you are getting, but after running the PhantomJS/RSelenium installation commands and manually creating the drivers with wdman a few times (as you did in your previous post), wdpa_fetch started working.

jeffreyhanson commented 5 years ago

I suppose one option might be to add the functionality to wdpa_fetch so you can tell it use a Docker container running Selenium/PhantomJS (instead of trying to run it on the host operating system), could you please try following this tutorial and seeing if you can use Docker with RSelenium to perform the web scraping: https://callumgwtaylor.github.io/blog/2018/02/01/using-rselenium-and-docker-to-webscrape-in-r-using-the-who-snake-database/

jcvdav commented 5 years ago

Hi @jeffreyhanson,

Thanks for looking into this. I ran the first chunk and still get some errors. This is what I got:

> library(wdman)
> library(RSelenium)
> pjs <- wdman::phantomjs(verbose = FALSE)
> rd <- RSelenium::remoteDriver(port = 4567L, browserName = "phantomjs")
> rd$open(silent = FALSE)
[1] "Connecting to remote server"
Error in checkError(res) : 
  Undefined error in httr call. httr output: Failed to connect to localhost port 4567: Connection refused
> rd$close()
Error in checkError(res) : 
  Undefined error in httr call. httr output: length(url) == 1 is not TRUE
> pjs$stop()
[1] TRUE

In case it matters, I am running on Windows 10. I will try the Docker approach now and will report bacj soon.

jcvdav commented 5 years ago

So, tried the Docker approach following the post, and ran into some other problems.

Docker Desktop cannot be installed on Widnows 10 Home. Apparently this is because Hyper-V was not a thing on Windows 10 Home back in 2016. It says that I need Second Level Address Translation (SLAT).

However, if I run systeminfo I get:

Hyper-V Requirements:
           VM Monitor Mode Extensions: Yes
           Virtualization Enabled In Firmware: Yes
           Second Level Address Translation: Yes
           Data Execution Prevention Available: Yes

Which indicates not only that Hyper-V is enabled, but also that I have SLAT....

I don't think I'll be able to test it this way. Looking at #6, I may be able to use wdpa_read() and work with a fresh January 2019 download of WDPA.

I'd be happy to contribute to this project and try to solve this issue to make sure wdpa works an all systems. Feel free to close for now.

jeffreyhanson commented 5 years ago

Ah ok - that's a pity - thank you very much for trying out the Docker approach and sharing your experiences.

I'm sorry, I don't have any other ideas for fixing this at the moment. Although one option would be to simply "guess" the download links because they follow a standard pattern (IIRC some combination of country name, month, and year), this approach is unsatisfying because when protectedplanet.net releases a new version of the global data set (about once a month) it will only create country-specific subsets of the data (e.g. shapefile data just for Spain) when they are requested for download from the website. So, if we were to simply try downloading the data using a "guessed" download link then there is no guarantee that the data will be available for download (because it may not exist yet). Additionally, before trying the RSelenium R pacakge, I originally tried using the rvest R package but I was unable to trigger protectedplanet.net's process for creating a country-specific subset of the data using this package (perhaps some JavaScript needs to be executed in a mock browser?).

Ok, I'll close this issue now, but if you (or anyone else reading this) has any ideas for fixing this then please reopen it.