ropensci / popler

The R package to browse and query the popler database
https://docs.ropensci.org/popler
MIT License
8 stars 7 forks source link

We could consider running popler as an API #50

Open bochocki opened 7 years ago

bochocki commented 7 years ago

Just an idea (definitely not urgent), but I think if we set up popler as a package that calls an external API, we might be able to do some cool things.

Basically, the way it would work is that all of the R code in the popler package would be used to generate a call to an API, and the API would run all of the code on the server side. Some potential benefits would be:

  1. We wouldn't need to store a username and password for the server in the package.
  2. By handling all the queries remotely, we could (potentially?) store the whole database locally in an .RData file that is loaded into the server-side R process when the API is called. Calling dplyr on tables that are already in-memory is orders of magnitudes faster, which would speed things up on the user end. Creating a new summary table would be quicker too.
  3. Doing everything on the server side would enable us to get the exact size of the file to be downloaded -- something we currently can't do -- and return this to the user before download.
  4. Using an API that keeps most of the heavy lifting on the server side would enable users to query the database using languages other than R. Since generating an API call mostly requires string manipulation, it wouldn't be too difficult to build a popler Python library, for example.
  5. Users would be required to install very few dependencies to run popler. The only one that comes to mind is httr, which we would use to make the API call.
  6. It would be easier to get popler approved on CRAN (and maybe ROpenSci?) if the package was a lightweight version that just did all of the user-facing things (browse, dictionary, citations, etc).
  7. We could update the server side ("heavy lifting") code independently of the CRAN package, potentially enabling us to have a more stable package on CRAN.

This would be a major shift in the way popler operates, but it wouldn't involve much recoding. Most of the heavy-lifting code would just be moved to the server. We could maintain that code as a separate R package just for ease of portability/installing/updating/etc. There's even a package, plumber, that's used to set up APIs for R.