r-wasm / webr

The statistical language R compiled to WebAssembly via Emscripten, for use in web browsers and Node.
https://docs.r-wasm.org/webr/latest/
Other
859 stars 67 forks source link

Support for external database networking #129

Open gregvolny opened 1 year ago

gregvolny commented 1 year ago

WebR Support of relational database will be a very important milestone, even if, Emscripten full support of Sockets in browser environment limit this task a lot. However, in his latest version they used some workaround and right now, this idea is more feasible using one of the followings ways: a) POSIX TCP Sockets over WebSockets; b) Full POSIX Sockets over WebSocket Proxy Server;

The Wordpress WASM team had some results by using fetch() https://github.com/WordPress/wordpress-playground/issues/85 other, Postgres WASM use this https://github.com/benjamincburns/websockproxy but on x86 virtualization on the browser.

gregvolny commented 1 year ago

Latest networking upgrade from Emscripten would facilitate supporting DBI. Please take a look on: https://github.com/WordPress/wordpress-playground/issues/116

gregvolny commented 1 year ago

FYI, an example of RDBMS connection in WASM : https://github.com/WordPress/wordpress-playground/pull/119

georgestagg commented 1 year ago

Better support for networking is an interesting potential development direction for webR, but unfortunately I can see two immediate issues that means this particular approach used by the Wordpress team would not easily transfer to webR.

1) Asynchronous networking raises a problem in that we do not use Asyncify when building webR with Emscripten. The overhead involved in using Asyncify is high, as a consequence of the way the R interpreter works. In addition to a performance cost, at the moment enabling Asyncify causes stack overflows in certain browsers. Without Asyncify, we cannot use asynchronous JavaScript APIs like WebSockets because we are not able to yield to the event loop.

2) On the other hand, browser based Wasm implementations do not expose raw Posix sockets due to the security model in place. IIUC the support provided by Emscripten for this involves essentally emulating traditional sockets with WebSocket traffic.

Even if we could easily use WebSockets in the webR worker thread, for this to work requires either a database server to support listening for WebSocket traffic rather than TCP socket traffic, or a proxy running on a user's local machine outside the browser and webR to translate the socket traffic from one to the other.

In my opinion running a proxy server outside of the browser, while a technical solution, does not fit within the primary goals of the webR project. At the moment, we are emphasising development for a self-contained solution without the need to install external software like proxy servers on end user devices.

That leaves the potential solution of database software that supports WebSocket connections natively. I am not aware if any such database servers already exist and are in popular use - I will make a mental note to research this. I think this solution would match better with our goals for webR but first requires serious changes to current database software by the developers of those projects, rather than any short term solution that can be implemented in webR.

gregvolny commented 1 year ago

@georgestagg Thanks a lot for this clarification. However, as you can understand it, support of persistent storage and networking database are very important for WebR.

**The overhead involved in using Asyncify is high, as a consequence of the way the R interpreter works. In addition to a performance cost, at the moment enabling Asyncify causes stack overflows in certain browsers** A theorical approach would be to offer an option with PHP wasm to manage database access and output sql query result as a list, array or hashmap for WebR consumption. However, I can imagine that the first issue would be the size of the WebR package plus PHP Wasm... What do you think about this approach?

**At the moment, we are emphasising development for a self-contained solution without the need to install external software like proxy servers on end user devices.** I'm agree with this approach to have standalone and ready to use WebR.

gregvolny commented 1 year ago
  1. Asynchronous networking raises a problem in that we do not use Asyncify when building webR with Emscripten. The overhead involved in using Asyncify is high, as a consequence of the way the R interpreter works. In addition to a performance cost, at the moment enabling Asyncify causes stack overflows in certain browsers. Without Asyncify, we cannot use asynchronous JavaScript APIs like WebSockets because we are not able to yield to the event loop.

Others are exploring the JSPI API https://v8.dev/blog/jspi instead of Asyncify to link a synchronous WASM with Async JS Web API. I think it's something you should also explore.

lionel- commented 1 year ago

@htiuser Stack-switching is on our radar but it's too early at the moment.

gregvolny commented 1 year ago

Hello @lionel- and @georgestagg , while I can see DBI in https://github.com/r-wasm/webr-repo/blob/main/repo-packages But, no RSQLite or a persistent OPFS SQLite https://github.com/tomayac/sqlite-wasm. Please can we have it so do RMySQL? I also tested nodeJS in the browser: https://stackblitz.com/edit/vitejs-vite-ttrbwh?file=main.js So, it become easier to have access to any RDBMS in a browser environment. Here's some ideas from the Wordpress Playground team: https://github.com/WordPress/wordpress-playground/issues/85 so do some libs: https://github.com/WordPress/wordpress-playground/tree/trunk/packages/php-wasm/node/src/lib/networking Thanks in advance

georgestagg commented 1 year ago

Hi @HTUser-1 ,

The latest version of webR supports building sqlite for Wasm, for use by Wasm R packages, and RSQLite should now be available at https://webr.r-wasm.org/latest/. This will ship with webR v0.2.0.

OPFS is on my radar, I expect work will commence some time after 0.2.0 is released, along with any related minor patch releases.

As mentioned in previous comments, raw socket/TCP access from Wasm requires an external proxy server to convert WebSocket traffic into traditional network traffic. We currently consider this out of scope in the short term, but we may revisit it in the future once the webR project is more mature.

gregvolny commented 2 weeks ago

As mentioned in previous comments, raw socket/TCP access from Wasm requires an external proxy server to convert WebSocket traffic into traditional network traffic. We currently consider this out of scope in the short term, but we may revisit it in the future once the webR project is more mature.

Hello @georgestagg,

pgmock differs from previous Postgres-in-the-browser projects by providing full feature-compatibility entirely inside the JavaScript runtime, without depending on a network proxy for communication. We did this by simulating a network stack in JavaScript that behaves like a real network, that can simulate TCP connections even on platforms that do not allow raw socket access. https://github.com/stack-auth/pgmock

I think that it's something you should explore and inspire. It's also related to: https://github.com/r-wasm/webr/issues/490#issue-2553894543

seanbirchall commented 1 week ago

@gregvolny what about duckdb? duckdb-wasm works just fine with with shinylive right now App. It also looks like very soon there will be support for more restricted / authenticated access via presigned URLs and OPFS support https://github.com/duckdb/duckdb-wasm/pull/1856. Something similar should be doable with sqlite as well, though you will be using JS to interact with it not R, similar to what I'm doing above.

Though this might not be relevant to you if you're not interested in shinylive but webR instead? Either way maybe this does help just coming from your comment on the shinylive repo.