ropensci / ckanr

R client for the CKAN API
https://docs.ropensci.org/ckanr
Other
99 stars 38 forks source link

dplyr interface of `ds_search_sql` #48

Open wush978 opened 9 years ago

wush978 commented 9 years ago

Dear @sckott ,

I think it is possible to implement a subset of dplyr interface for ckanr. I guess we need to use the engine in dplyr to generate the SQL statement and submit the statement to ckan server via ds_search_sql. Does this feature fit this package?

sckott commented 9 years ago

Possibly. Since we don't use dplyr already, I think it would best in Suggests, so that users that don't need it don't have to install it since it is a big dependency.

I don't yet see how it would work. Can you give an example? with pseudocode

wush978 commented 9 years ago

Sure. Moreover, I'll directly implement a prototype. Any progress will be updated here.

sckott commented 9 years ago

thanks

wush978 commented 9 years ago

Oops, I guess I need to push after finishing the prototype. Sorry!

wush978 commented 9 years ago

Dear @sckott ,

https://github.com/ropensci/ckanr/compare/master...wush978:dev/dplyr shows my idea of implementing interface of dplyr. There are two files: dbi.R implements the DBI interface and dplyr.R implements the dplyr's sql interface according to the DBI interface.

This is not a complete implementation, so only the following script works.

library(ckanr)
url <- 'http://demo.ckan.org/'
sql <- 'SELECT * from "f4129802-22aa-4437-b9f9-8a8f3b7b2a53" LIMIT 2'
src <- src_ckan(url)
tb <- tbl(src, sql(sql))
# Then we could do something like this:
# dplyr::filter(tb, ...) %>% compute

If you think this approach is OK, then I will keep implementing the complete interface of dplyr.

sckott commented 9 years ago

@wush978 Sorry, just got back from vacation. Looks good.

If you think this approach is OK, then I will keep implementing the complete interface of dplyr.

What is the rest of the interface?

wush978 commented 9 years ago

@sckott

What is the rest of the interface?

I only implement a minimal example to demonstrate my idea of using DBI to link ds_search_sql to dplyr. Currently, all the following basic verbs in dplyr do not work.

And some verbs of joining...

sckott commented 9 years ago

@wush978 what I'm wondering about is if you have to write versions of select, filter, etc. verbs in this pkg, or if we can just use dplyr itself?

wush978 commented 9 years ago

We use these verbs directly, but we need to guide dplyr how to use the ds_search_sql to submit the SQL statement. That is to say, we do not need to implement these verbs. We override some internal behavior inside dplyr according to https://cran.r-project.org/web/packages/dplyr/vignettes/new-sql-backend.html.

sckott commented 9 years ago

Okay, sounds great!

sckott commented 8 years ago

Any progress on this @wush978

wush978 commented 8 years ago

Thanks for reminding me. I did forget this issue.

I'll try to do some progress during Chinese New Year.

sckott commented 8 years ago

thanks!

wush978 commented 8 years ago

Hi @sckott ,

Today I added some unit tests (https://github.com/ropensci/ckanr/commit/12cb680f95217a325a2a88d6a4f3de609a4ae838b). The basic verbs: filter, arrange, select are workable now. The distinct is failed and I need to check if it is workable with PostgreSQL which is the backend of the CKAN server.

By the way, the non-ascii column name will make the server return http 500, so I only test these features with ascii encoded tables.

And I temporarily make ckanr depend on DBI for simplicity. Otherwise, I need to import all S4 classes and methods from DBI. How do you think about making the package depend on DBI?

sckott commented 8 years ago

hmm, that commit links gives a 404, where should that point to?

on DBI - perhaps. Definitely worth trying, haven't thought about yet if it's not worth it, but I'm guessing it will be fine

wush978 commented 8 years ago

Oops, the link should be: https://github.com/wush978/ckanr/commit/12cb680f95217a325a2a88d64f3de609a4ae838b

sckott commented 8 years ago

@wush978 I had a look, looks very nice. what is the CKAN instance you used for testing?

wush978 commented 8 years ago

I am using http://demo.ckan.org to test

sckott commented 8 years ago

thanks

sckott commented 8 years ago

@wush978 what versions of dplyr/DBI do you have? I'm getting errors trying to run your tests with your branch

sckott commented 8 years ago

e.g,.

tb <- tbl(src, name = name_list[3])
#>  Error in mget(plabels[hasSubclass], env) : invalid first argument
wush978 commented 8 years ago

@sckott

Here it is:

> packageVersion("dplyr")
[1] ‘0.4.3’
> packageVersion("DBI")
[1] ‘0.3.1’
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=zh_TW.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=zh_TW.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=zh_TW.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_TW.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3      ckanr_0.1.0.9190 DBI_0.3.1        testthat_0.10.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2      digest_0.6.8     crayon_1.3.1     assertthat_0.1   R6_2.1.1         jsonlite_0.9.17  magrittr_1.5    
 [8] httr_1.0.0       stringi_1.0-1    curl_0.9.3       rstudioapi_0.3.1 tools_3.2.3      stringr_1.0.0    parallel_3.2.3  
[15] memoise_0.2.1   
wush978 commented 8 years ago

@sckott ,

The unit tests for basic verbs are updated and I also tried left_join.

Hope they work in your computer.

sckott commented 8 years ago

Okay, looks good. I had a dev version of DBI ahead of what you had, works good. now I do get an error on https://github.com/wush978/ckanr/blob/dev/dplyr/tests/testthat/test-dplyr.R#L94 but if I changed to the 2nd element name_list[2] it worked, but aren't data always changing on demo.ckan.org?

Do you think the implementation is nearly complete now?

wush978 commented 8 years ago

Yes, it is nearly complete.

We just need a stable testing environment. Could you give me any suggestion?

And maybe we need a short example to demonstrate this feature. Do you have any idea of where to put the examples?

sckott commented 8 years ago

We just need a stable testing environment. Could you give me any suggestion?

There's many CKAN instances. Not sure which is best, but I imagine there's many that have good up time, and don't have transient datasets in them. See servers(). Perhaps try https://data.noaa.gov/dataset

And maybe we need a short example to demonstrate this feature. Do you have any idea of where to put the examples?

You could create a man file with an appropriate name and include examples in that. e..g., like

#' foo bar foo bar
#' ....... more text .....
#' @name dplyr-interface
NULL
wush978 commented 8 years ago

Thanks.

I'll implement them on the weekend and send a PR.

wush978 commented 8 years ago

@sckott ,

Sorry that I cannot find time to write some code on the last weekend. Do you have any schedule of this issue? Please let me know if we need to be hurry or I'll try to find sometimes on the next weekend to do this.

sckott commented 8 years ago

No rush on this. And thanks for all your work on it

wush978 commented 8 years ago

You're welcome. I'm glad to have a chance to contribute.

florianm commented 7 years ago

@wush978 and @sckott - your work has saved my Department a ton of headache!

We have moved behind a Microsoft single-sign-on authentication wall, and although we can make the CKAN API accessible to our intranet (SSO exempt), the resource URLs can't be easily made SSO exempt. This means that we cannot access our data on CKAN via read.table("ckan resource url")! However, since dplyr can access the datastore directly:

ckan <- ckanr::src_ckan("https://data.dpaw.wa.gov.au/")
dplyr::tbl(src = ckan$con, from = res_id) %>% as_tibble(.)

All we need is for our data to be a clean CSV that uploads into the datastore. Thought you should know that your work has made our lives heaps easier. Much appreciated!

sckott commented 7 years ago

nice!