Open ajdamico opened 7 years ago
hrm. .onAttach()
does not get called when you do that and that's where V8
gets initialized. However, I agree that this should work and it shld be as simple as a test for the pkg global being initialized when that function is called.
Huge thanks for finding this edge case. I'll try to get a patch on github tonight.
hi, thanks. i guess i'll go with this workaround to eliminate the cran build note until you push the next version to cran :)
https://github.com/ajdamico/lodown/commit/512ed291b126f29f11010e67b3a9d1f1d76b2a7c
thank you for making this possible
# automatically load the world values survey
devtools::install_github("ajdamico/lodown")
library(lodown)
lodown( "wvs" , output_dir = "C:/My Directory/WVS" )
OH wait. I get the use-case you're doing now. You really don't need to use curlconverter
in a pkg that way. If you do just straighten()
:
library(curlconverter)
browserGET <- "curl 'http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp' -H 'Host: www.worldvaluessurvey.org' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1'"
you get back a list
:
str(straighten(browserGET))
## List of 1
## $ :List of 5
## ..$ url : chr "http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp"
## ..$ method : chr "get"
## ..$ headers :List of 6
## .. ..$ Host : chr "www.worldvaluessurvey.org"
## .. ..$ User-Agent : chr "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0"
## .. ..$ Accept : chr "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
## .. ..$ Accept-Language : chr "en-US,en;q=0.5"
## .. ..$ Connection : chr "keep-alive"
## .. ..$ Upgrade-Insecure-Requests: chr "1"
## ..$ url_parts:List of 9
## .. ..$ scheme : chr "http"
## .. ..$ hostname: chr "www.worldvaluessurvey.org"
## .. ..$ port : NULL
## .. ..$ path : chr "WVSDocumentationWV4.jsp"
## .. ..$ query : NULL
## .. ..$ params : NULL
## .. ..$ fragment: NULL
## .. ..$ username: NULL
## .. ..$ password: NULL
## .. ..- attr(*, "class")= chr [1:2] "url" "list"
## ..$ orig_curl: chr "curl 'http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp' -H 'Host: www.worldvaluessurvey.org' -H 'User-Agent: Mozilla/5."| __truncated__
## ..- attr(*, "class")= chr [1:2] "cc_obj" "list"
## - attr(*, "class")= chr [1:2] "cc_container" "list"
Which means you can either use dput()
to capture that structure or saveRDS()
to turn it into an R data file which you can have auto-loaded in your pkg.
You're prbly going the next step and doing a make_req()
:
straighten(browserGET) %>%
make_req() -> req
One thing that I've been struggling how to make clearer is that immediately after make_req()
is called the contents (source code) of the function it creates is placed on the clipboard. i.e. if you cmd-v (mac) or ctrl-v (win) in the editor you'll get the source code for the function placed right where the cursor is. In this case:
httr::VERB(verb = "GET", url = "http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp",
httr::add_headers(Host = "www.worldvaluessurvey.org",
`User-Agent` = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0",
Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
`Accept-Language` = "en-US,en;q=0.5",
Connection = "keep-alive",
`Upgrade-Insecure-Requests` = "1"))
You could also get that by just typing req[[1]]
(no parens) at the R console:
function ()
httr::VERB(verb = "GET", url = "http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp",
httr::add_headers(Host = "www.worldvaluessurvey.org", `User-Agent` = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0",
Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
`Accept-Language` = "en-US,en;q=0.5", Connection = "keep-alive",
`Upgrade-Insecure-Requests` = "1"))
<environment: 0x10675c0d8>
that adds some cruft which is why i did made it "auto copy to clipboard".
That particular curl
translation can be simplified to (when i do this for my own projected i iteratively remove individual cookies and headers until i get the minimum viable httr
verb call I can):
GET(url="http://www.worldvaluessurvey.org/WVSDocumentationWV4.jsp"))
I'm still going to make straighten()
work via ::
calling but I wanted to make sure you knew ^^ since it's unlikely you really do need to use curlconverter
within a pkg.
I think you're going to need to use a different target. A great deal of the content on that page is dynamically loaded at run-tme and the center column (which has the citation and data files) that you want to target is also an iframe
:
(apologies for the faint highlighting due to the dark theme but it shld be visible).
The next problem is that more of the contents is loaded via another call to a javascript file:
And, your final problem is that the js file in ^^ loads the actual content but:
All of the href
s are wrapped in a call to DocDownloadLicense()
which dynamically builds the form you're prbly familiar with:
Without something like RSelenium
or seleniumPipes
you're not going to be able to automate this and you can't embed either in an R package since you need a back-end selenium grid, standalone selenium server or phantomjs running live to do the work.
i think the current version of lodown works without issue?
On Jan 16, 2017 12:56 PM, "boB Rudis" notifications@github.com wrote:
I think you're going to need to use a different target. A great deal of the content on that page is dynamically loaded at run-tme and the center column (which has the citation and data files) that you want to target is also an iframe:
[image: image] https://cloud.githubusercontent.com/assets/509878/21983556/ed7a6810-dbbf-11e6-81b9-c4e7f93dd2d3.png
(apologies for the faint highlighting due to the dark theme but it shld be visible).
The next problem is that more of the contents is loaded via another call to a javascript file:
[image: image] https://cloud.githubusercontent.com/assets/509878/21983640/483b000c-dbc0-11e6-8743-374daf6c6c50.png
And, your final problem is that the js file in ^^ loads the actual content but:
[image: image] https://cloud.githubusercontent.com/assets/509878/21983687/7e8e7468-dbc0-11e6-9e7a-f72b106ad789.png
All of the hrefs are wrapped in a call to DocDownloadLicense() which dynamically builds the form you're prbly familiar with:
[image: image] https://cloud.githubusercontent.com/assets/509878/21983774/d58ab39e-dbc0-11e6-9c53-7fcd71a139ad.png
Without something like RSelenium or seleniumPipes you're not going to be able to automate this and you can't embed either in an R package since you need a back-end selenium grid, standalone selenium server or phantomjs running live to do the work.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/curlconverter/issues/15#issuecomment-272855512, or mute the thread https://github.com/notifications/unsubscribe-auth/AANO50bLqROMaJq44bb0YR42ADGpmAkbks5rS2jjgaJpZM4Lj6rl .