posit-dev / r-shinylive

https://posit-dev.github.io/r-shinylive/
Other
151 stars 16 forks source link

Fetching external data via httr, Curl not available #31

Closed scrapeable closed 1 week ago

scrapeable commented 10 months ago

Will webR be able to fetch external data from API's and what not?

> Warning in install(dep, repos, lib, quiet) :
  Requested package curl not found in webR binary repo.
Downloading webR package: sys
Downloading webR package: askpass
Downloading webR package: openssl
Downloading webR package: httr

I am clearly unfamiliar with WASM, but checking stackoverflow, and asking chatgpt looks like the answer is probably going to be no.

scrapeable commented 10 months ago

I always thought jsonlite used httr behind the scenes for remote files / json apis, so now I'm even unfamiliar in R.

The below works though

library(shiny)
library(jsonlite)

ui <- fluidPage(
    tableOutput('json_api')
)

server <- function(input, output) {
    output$json_api <- renderTable({
        head(
            fromJSON("https://api.github.com/users/hadley/orgs"),
            5
        )
    })
}

shinyApp(ui = ui, server = server)
georgestagg commented 10 months ago

The httr and curl R packages are not currently supported in webR.

The issue is that opening raw network sockets is not permitted from the WebAssembly sandbox. As such, it's not reasonably possible to compile libcurl for Wasm, at least for the moment.

Nevertheless, the base R distribution in webR has been modified to handle calls to download.file() and url() under Wasm, using web browser APIs. For example, the following methods using base R code work:

data_url <- "https://media.githubusercontent.com/media/datablist/sample-csv-files/main/files/customers/customers-100.csv"
download.file(data_url, "data.csv")
readLines(url(data_url))
read.csv(data_url)

It might be possible to fetch the external data you require these functions. This is most likely why fromJSON() works in your example. However, note that since this generates a cross-origin network request, you are restricted by the web browser through the CORS mechanism, so not all external data download or API fetching will work. It will depend on if the web server at the other end of the connection enables CORS support.


Technically, some methods can enable network connections from Wasm, but AFAIK they rely on one of two schemes: 1) Running a WebSocket-to-native-socket proxy server on the host machine outside Wasm. 2) Shimming libraries like libcurl to translate network connections from C/Unix APIs into web browser APIs, where restricted safe networking can be done.

The first method is not a reasonable solution for a project like webR or Shinylive, where we aim to provide a self-contained application without supporting servers outside the web browser.

The second method can partially work, but Curl is a project with a wide range of features and supports many network protocols. At best, we could perhaps support Curl's HTTP and WS features by converting libcurl API calls into web browser API calls, but this would involve a lot of engineering work. Even so, web browser fetch/XHR APIs are restricted for security reasons so not all types of HTTP connection would be permitted. So while this is on the webR development radar, libcurl support is not planned for in the short term.

scrapeable commented 10 months ago

Thank you for the detailed response George! Yes I was sending requests to some other sites and noticed CORs response in the browser.

For me I actually only want this to interact with my own API's where CORs would be enabled so that's actually a no issue. I'm brainstorming how I can change my API's around so that they're accessible as just a URL with an api_key or something like that as a query parameter for authentication.

I have a lot of small dashboards / shiny apps that just need a little bit of shiny interactivity, often too much for crosstalk to handle, but still pretty basic apps.

I imagine shimming some of the libraries from libcurl would be a feature users would want at some point. When I think of webR I see a huge opportunity to replace virtually all of my dashboarding / basic web app needs. Which for years has been a pain at a number of jobs, IT's not interested in spinning up compute for a few shiny apps, and the business isn't invested enough in R to justify any of Posit's products. Even right now with utils download.file and url it might be enough for me to achieve what I'm looking to do as I'm working on my own stack / business. However I guess I could still probably run into some of the same issues if I were working for another company, where now the issue is needing a very specific API endpoint if I want to interact with remote (but likely on the same network) data.

Either way really enjoying webR so far! All the above will help me out a lot going forward.

georgestagg commented 10 months ago

I'm brainstorming how I can change my API's around so that they're accessible as just a URL with an api_key or something like that as a query parameter for authentication.

Just a quick note to keep in mind that, unlike a traditional Shiny app, there are no secrets in a shinylive app! Your entire app's source code, including any API Keys saved within, is viewable from the client side.

scrapeable commented 8 months ago

A follow up, which is still fetching data related, and probably makes sense to be here on the r-shinylive vs webr repo.

After reading some of the great posts by @ColinFay it looks like one of the best ways to interact with data via shinylive/webr will come from mounting file systems. From webr there are two APIs WORKERFS and NODEFS, where WORKERFS probably makes sense for smaller tasks where downloading the whole file system is not an issue and follows along with the goal of creating self-contained applications.

While NODEFS is a little different and requires Node.js running I assume in the background similar to how Colin showed in his recent blog posts. Which I believe should allow for mounting very large file systems without needing to download it first, just needs to be accessible by the host machine. Giving a lot of different possibilities allowing users to interact with all sorts of different data, avoid needing to redownload packages, a bunch of other things.

So the question I'm trying to get to here is whether from shinylive there might be an option to export an app under Node.js?

# current
shinylive::export("myapp", "site")

# bundle app with Node.js 
shinylive::export("myapp", "site", backend = TRUE)
# or maybe....
shinylive::export("myapp", "site", useNode= TRUE)

I also see that there is an R API from webr to mount a file system in R under Node.js, so maybe this is something under development? As I imagine anyone that gets far enough to launch a webr/shinylive from Node.js would just mount it from Node. At the same time webr isn't and R package, just available on npm with the webr R package being

Data and Functions for Web-Based Analysis

So I assume the below code chunk exists only when running R from Node? So maybe the above isn't possible an I'd need to write a little bit of Node.js to achieve this.

webr::mount(
  mountpoint = "/data",
  source = "./extra",
  type = "NODEFS"
)

In the nature of R it'd be nice to do everything from R, but totally understand if that's not the case, either way it's been exciting watching webr/shinylive!

scrapeable commented 8 months ago

For example I was hoping something like this would maybe work with shinylive + Node.js. I've created a small R repl app, exported it to a directory named site. I can serve this shinylive app with ease by using live server or the below Node.js code, the shiny app works as expected. I was hoping it might be this simple to mount a directory from my host machine available to my shinylive app. However I imagine there's a bit more to it and this needs to be within whatever webR call is driving the shinylive app.

const express = require('express');
const path = require('path');
const { WebR } = require('webr');

(async () => {
    globalThis.webR = new WebR();
    await globalThis.webR.init();

    console.log("webR ready");

    await globalThis.webR.FS.mkdir("/data")

    await globalThis.webR.FS.mount(
      "NODEFS",
      {
        root: path.join('C:/Users/seanb/Documents/dir-somewhere-else')
      },
      "/data"
    )

  })();

const app = express();
const port = process.env.PORT || 3000;

app.use(express.static(path.join(__dirname, 'site')));

app.get('/', (req, res) => {
  res.sendFile(path.join(__dirname, 'site', 'index.html'));
});

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`);
});

Then from my shinylive app I was hoping something like this would work, the dir-somewhere-else has 4 files in it. repl

georgestagg commented 1 week ago

Hi @scrapeable,

Whilst webR can work in a browser and Node, Shinylive has been designed only for use in a web browser. In your example code above there are two separate webR sessions:

They are independent and cannot talk to one another.

The correct way to access local file systems from the web browser is by using the JavaScript File System API. However, using this in webR is difficult for deep technical reasons. We plan to support it in the future, in a similar way to NODEFS and WORKERFS, but are currently blocked due to those reasons. The issue to watch for progress on this is https://github.com/r-wasm/webr/issues/56.


I'm going to close this for now, in the interest of keeping things neat and tidy, since we've gone off track from the original issue title. For those coming for information on httr and curl, see my comment at https://github.com/posit-dev/r-shinylive/issues/31#issuecomment-1786739311 for a workaround using base R's url() function.