r-wasm / webr

The statistical language R compiled to WebAssembly via Emscripten, for use in web browsers and Node.
https://docs.r-wasm.org/webr/latest/
Other
875 stars 68 forks source link

Mounting with non-standard metadata URL does not work / Mounting with `Blob` objects from main JS thread does not work #486

Open dipterix opened 1 month ago

dipterix commented 1 month ago

Dear webr devs,

I generated my dataset using rwasm::file_packager. However, when I try to load it directly from JS, the browser complains

Uncaught (in promise) TypeError: can't convert undefined to BigInt
    doStat https://webr.r-wasm.org/v0.4.2/R.bin.js:2061
    ___syscall_stat64 https://webr.r-wasm.org/v0.4.2/R.bin.js:2061

To replicate, you can download the data from here: https://zenodo.org/records/13825852

When loading from webr::mount locally, everything is fine.

This is a screenshot of results with webr::mount: it can read the files correctly:

image

However, when I try to deploy the website and use the URL

(For you to test the URL, I'm using the zenodo API with CORS. Just FYI the URL has no issue. The error exists when testing it locally too)

const path = "/home/web_user/rave_data";
const data = "https://zenodo.org/api/records/13825852/files/project-demo-minimal.data/content";
const meta = "https://zenodo.org/api/records/13825852/files/project-demo-minimal.js.metadata/content";

await webR.FS.mkdir(path);

// Download image data
const _data = await fetch(data);
const _metadata = await fetch(meta);

// Mount image data
const options = {
  packages: [{
    blob: await _data.blob(),
    metadata: await _metadata.json(),
  }],
}

await webR.FS.mount("WORKERFS", options, path);

await mainWebR.evalR("readLines('~/rave_data/raw_dir/DemoSubject/rave-imaging/fs/mri/transforms/talairach.xfm')")

Error:

Uncaught (in promise) TypeError: can't convert undefined to BigInt
    doStat https://webr.r-wasm.org/v0.4.2/R.bin.js:2061
    ___syscall_stat64 https://webr.r-wasm.org/v0.4.2/R.bin.js:2061
    safeEval blob:http://localhost:3530/40ef506b-683b-40b4-9dfc-512b84137000:2087
    captureR blob:http://localhost:3530/40ef506b-683b-40b4-9dfc-512b84137000:9289
    evalR blob:http://localhost:3530/40ef506b-683b-40b4-9dfc-512b84137000:9338
    dispatch blob:http://localhost:3530/40ef506b-683b-40b4-9dfc-512b84137000:8948
    PostMessageChannelWorker blob:http://localhost:3530/40ef506b-683b-40b4-9dfc-512b84137000:4320
[error.ts:11:4](https://webr.r-wasm.org/src/webR/error.ts)

Browser: Firefox 130.0.1 (64-bit) OS: Apple M2 Sequoia WebR: 0.4.2 rwasm: 0.2.0.9000

dipterix commented 1 month ago

Also Why there is no metadata path in webr::mount? It seems that webr derives meta URL from the data URL. If I can feed the metadata URL by myself, then maybe I don't have to use webR.FS.mount

georgestagg commented 1 month ago

Hi,

Yes, webR derives the metadata URL from the data URL. This is why you cannot use your Zenodo URLs, they end in [...]/content and webR is not expecting that. I can see that in your case the automatic derivation fails. I think the right solution here is to provide an optional argument as part of the webr::mount() API for users to give their own metadata URLs where appropriate. The argument does not currently exist, but should not be too difficult to add.


Even with the above, your other method should work - but it looks like there's currently a bug related to how Blob objects are transferred to the webR worker thread before passing them onto the Emscripten FS API. I'm sorry about that, we'll try to get it fixed for the next release. With a fix in place, your JS code should work as is.


If it is urgent, you can work around the bug by doing the work synchronously on the webR worker thread, avoiding the problem with transferring the Blob object. However, this will be extremely messy so it is only worth implementing as a temporary measure:

dir.create("/home/web_user/rave_data")

webr::eval_js('
  const path = "/home/web_user/rave_data";
  const data = "https://zenodo.org/api/records/13825852/files/project-demo-minimal.data/content";
  const meta = "https://zenodo.org/api/records/13825852/files/project-demo-minimal.js.metadata/content";

  const data_req = new XMLHttpRequest();
  data_req.responseType = "arraybuffer"
  data_req.open("GET", data, false);
  data_req.send(null);
  const _data = data_req.response

  const meta_req = new XMLHttpRequest();
  meta_req.responseType = "json"
  meta_req.open("GET", meta, false);
  meta_req.send(null);
  const _meta = meta_req.response

  const options = {
    packages: [{
      blob: new Blob([_data]),
      metadata: _meta,
    }],
  }

  Module.FS.mount(Module.FS.filesystems.WORKERFS, options, path)
')

readLines('~/rave_data/raw_dir/DemoSubject/rave-imaging/fs/mri/transforms/talairach.xfm')

Screenshot 2024-09-23 at 09 21 51

dipterix commented 1 month ago

Thanks for checking it. Now I see what's going on... So blobs are converted to typed array when transferred to workers... No wonder when I console.log options, the blob is typed arrays.

Currently my workaround is to manually slice the data manually and use webR.FS.writeFile to generate directory tree. It worked great haha. (see https://rave.wiki/posts/3dviewer/viewer201.html)

I do have a wishlist to convert WORKREFS to IDBFS. Currently everytime when users open the website, they need to download the data from the internet. This is fine is the internet if fast. However it's a waste of resources and time when viewing it on phone or metered wifi. If the data can be stored on the client side, then this might significantly speed up. Any suggestions how I can implement this feature?

georgestagg commented 1 month ago

So blobs are converted to typed array when transferred to workers

Yes, we are forced to make the conversion for annoying but unrelated technical reasons. In the future when we're able to work around the issue we'll be able to return to transferring Blob objects with a more traditional JS postMessage() transfer instead.

I do have a wishlist to convert WORKREFS to IDBFS.

This should work, but it's not been tested heavily. If you do investigate this method don't forget to run syncfs() at the correct times with the correct arguments.

If the data can be stored on the client side, then this might significantly speed up.

I did notice that your data was not being cached by my browser when I was testing. How much control do you have over the content HTTP headers? If you can configure Zenodo to send the content with HTTP header Cache-Control: max-age=86400 (or some other larger number) the web browser should automatically cache this download and restore it from disk cache on subsequent reloads, rather than re-downloading the entire dataset each time.

dipterix commented 1 month ago

In the future when we're able to work around the issue we'll be able to return to transferring Blob objects with a more traditional JS postMessage() transfer instead.

Or webr.FS.mount can accept typed arrays? It's easy to make blobs out of the arrays. Just a quick thought as typed arrays are naturally supported by JS workers. You just need to convert to blobs at the final Module.FS.mount step

This should work, but it's not been tested heavily. If you do investigate this method don't forget to run syncfs() at the correct times with the correct arguments.

Thanks I will try it.

I did notice that your data was not being cached by my browser when I was testing. How much control do you have over the content HTTP headers? If you can configure Zenodo to send the content with HTTP header Cache-Control: max-age=86400 (or some other larger number) the web browser should automatically cache this download and restore it from disk cache on subsequent reloads, rather than re-downloading the entire dataset each time.

I don't have controls over Zenodo. I guess I'll try the cache by myself for now. Good to know that you can set cache controls. Then this is an edge case and won't appear too often in the future : )