Adding support to connect remote databases

jarodmeng commented 6 months ago

I understand that there are challenges posted by Wasm to make remote database calls. But having the ability to load data from remote databases and dynamically compute them and serve the results all natively in a browser would truly make shinylive much more powerful. Currently it seems that most shinylive use cases and examples rely on small fixed datasets either shipped with base R or some pre-built packages. This pattern doesn't match what users are using in the wild which often involves much larger datasets and often those data live in a remote database rather than the user's machine.

Having said that, I'm truly inspired by what shinylive can already do! Thanks for the amazing work.

seanbirchall commented 6 months ago

You can do this currently you just need to make the right APIs/proxies and allow CORs. There are a few great functions/packages that already work in webR and allow for you to get data (utils::download.file() jsonlite::fromJSON() DBI sqlite).

GET data: Right now it looks like using some proxy is the only way to do this. For example I use AWS quite a bit for my cloud needs and have an Aurora Postresql database which I use to store data. I built an API to this database using Lambda which accepts SQL queries and returns data (for most cases this is not recommended). I can then protect this API using Cognito + API Gateway to make sure only authenticated users are making calls to my API. This example is all 'serverless' similar to shinylive but the same could be done using any EC2 equivalent. This issue is actually relevant to this as Cognito and other Cognito like services pass a url query parameter back after users are signed in and redirected https://github.com/posit-dev/r-shinylive/issues/28

POST/send data: Another example for how you can send data to another service is by shimming an R function to run JavaScript code. I use the below function to send POST requests with data to another service. Under the hood all this is doing is sending a JS fetch request to another service, again you need your service to have CORs open to receive this.

rfetch <- function(url, ...){
  if(missing(url)){
    stop("url argument required")
  }

  if(missing(...)){
    warning("No additional arguments provided, sending GET request")

    code <- paste0(
      "fetch(", url, ", {",
      "method: 'GET'",
      "})"
    )

  }else{
    args <- jsonlite::toJSON(list(...), auto_unbox = TRUE)

    code <- paste0(
      "fetch(", url, ", {",
      args,
      "})"
    )
  }

  message(code)

  code <- paste0(
    code,
    ".then(function(response){return console.log(response.json())});",
  )

  shinyjs::runjs(code = code)
}

What you're actually probably looking for https://github.com/posit-dev/r-shinylive/issues/46: In the future you might be able to mount a large file system which could host data files, but also databases like sqlite/duckdb. Which would allow you to host a shinylive app behind a firewall/vpc and query something like a sqlite database directly without any proxy service. Again It is not recommended to mount network file systems to a client web app but for the right use case it could be okay.

Right now it's even possible to mount smaller remote file system images WORKERFS, though you need to download the entire file system first. The above solution if ever possible via NODEFS would hopefully get around this. Likely making it possible to mount very large network file systems.

But agree with you this is all amazing work and an exiting time to be making R Shiny apps. It's like every pain point I ever had selling an org to use R Shiny is gone.

jarodmeng commented 6 months ago

@seanbirchall Thanks for the info! I think those solutions you mentioned are still quite difficult to implement for most users. The core challenge here is that although running R in wasm "feels" like running R on a local machine, it's actually a "virtual" environment. The fact that that virtual environment is in the browser means that there are restrictions on things like outbound/inbound network calls. Those challenges cannot be solved by individual users in an easy enough way, so I'm hoping that there can be some canonical native solutions embedded in the shinylive system.

georgestagg commented 6 months ago

In my opinion, right now the most natural way for an R user to make use of external data in Shinylive is by exposing the data as a web API using plumber (bearing in mind any security considerations).

As Sean mentions, once your data is exposed over HTTPS it can be loaded into the R session using tools such as download.file() or url(), which have already been reconfigured by webR to use a JavaScript network connection -- This is allowed even in the restricted browser environment as long as we are using HTTPS and CORS has been configured.

This is, of course, not a real direct database connection. But it is probably the most elegant technique in the short term. We are thinking about how this kind of thing should work in the future, but it is going to take some time for us to investigate possible solutions.

There is a webR issue tracking this, including a little more discussion, here: https://github.com/r-wasm/webr/issues/129.

seanbirchall commented 6 months ago

@jarodmeng ya definitely not the simplest but shows that you could make just about 'production' apps out the box on the web or internally for an org. As George mentioned plumber is easy to use you can have a separate R process that runs a plumber API and communicate with that over HTTPS.

There's a lot of different possibilities with what we currently have. One other way that should work and many others have tried over the years with shiny is to package shinylive apps as standalone executables similar to electricShine. It's pretty straight forward in Node to package a standalone executable. All you would really need to do from there is shinylive::export() and point an express web server to the folder with your shinylive app. You would then again need to develop the APIs to whatever data source your connecting to in Node.

I think these 2 solutions might be common things we see now and make shiny something we see a lot more often on the web.

jarodmeng commented 6 months ago

@seanbirchall and @georgestagg Thanks! I will give plumber APIs a try.

nrarnt commented 6 months ago

@seanbirchall Thanks for the comments! Very insightful! Just out of curiosity. What would you use to authenticate the users? Would there be an e-mail and password input in the app? Or you would authenticate only the browser?

seanbirchall commented 6 months ago

@nrarnt I would use AWS Cognito or any other cloud provider equivalent to Cognito. The process will look like it does in this app if you click on the Login / Sign up button. So yes there is an email / password input, but in my case I'm using AWS Cognito's hosted UI, the login / sign up UI is not written in Shiny (currently does not work, but probably should end up working, due to https://github.com/posit-dev/r-shinylive/issues/28 so don't sign up!)

From there after successful log in users will be redirected to the web app with a query parameter called code which we must read in using parseQueryString(session$clientData$url_search). From there we can use this code to authenticate to other services that use Cognito + API Gateway for Auth.

Authentication probably only makes sense when your shinylive app is interacting with other services. So you're not authenticating to the app or the browser but to services like a database or filesystem that is accessed from a Lambda function and protected by Cognito + API Gateway. We pass this code to these services for authentication. You will need to use something like my rfetch function above to likely shim a POST request to the service with the relevant data and Auth code.

I guess an alternative to 'authenticate to the app' could be having an EC2 running express and a static web server. You point this express server to your shinylive app for hosting and can use Node to set up some custom Auth at the app layer vs the service layer in the above example. With shinylive there's not much to hide all your source code, services, any api keys, data are all easy to find, so it's probably best to just use something like Cognito + proxy services or have no Auth at all.

nrarnt commented 6 months ago

@seanbirchall Thanks for the response! I actually already use Cognito with "normal" Shiny Applications, but it totally makes sense in the way that you describe. The issue I was facing was actually getting the session token from the query parameters but that can be done with the issue you linked. Great! Thanks so much for the help!

seanbirchall commented 6 months ago

@nrarnt no problem! I hope I'm not talking to soon, but in theory this all should just work as soon as passing query parameters works.

seanbirchall commented 5 months ago

@nrarnt A quick band aid fix I'm using is changing viewerFrameRef.current.src = appInfo.urlPath; to viewerFrameRef.current.src = appInfo.urlPath + window.location.search; in your app.

So destdir/shinylive/shinylive.js there should be two cases where the above needs to be done. From there my app now accepts query parameters... going to test out Cognito.

nrarnt commented 5 months ago

That's great @seanbirchall ! Thanks a lot for the update!

posit-dev / r-shinylive

Adding support to connect remote databases #55