yihui / xfun

Miscellaneous R functions
https://yihui.org/xfun/
Other
141 stars 28 forks source link

embed_file is limited to 2MB file size in Chrome #23

Open grst opened 4 years ago

grst commented 4 years ago

I used xfun::embed_file with a several MB large file. While the link works perfectly in Firefox, I get a Network error when trying to download the same file in Chrome.

I figured out that there appears to be a 2MB file size limit for data_url in Chrome.


By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

yihui commented 4 years ago

Many thanks for the information! It is very helpful. I don't have time at the moment. If anyone wants to help (blob urls seems promising to me), please feel free to send a pull request.

yihui commented 4 years ago

I did some research on this issue, and there is one challenge that is beyond my capability to solve. I think the implementation would be like below. We generate an <a> tag from R of this form (not tested):

<a onclick="function() {var blob = new Blob(new Uint8Array([X]), {type: Y}); this.href = URL.createObjectURL(blob);">
Download
</a>

where X is an array of integers, which can be generated from

paste(as.integer(xfun:::read_bin(FILE)), collapse = ',')

and Y is mime::guess_type(FILE).

The challenge is that the text representation of the FILE content is very inefficient (much less efficient than base64 encoding; more on binary-to-text encoding). In the SO post you mentioned, the author said:

Blobs are pure binary byte-arrays which does not have any significant overhead as Data-URI does, which makes them faster and smaller to handle.

I don't know how to avoid the overhead here.

For now, I'll mention the file size limit on the help page. Thanks again!

grst commented 4 years ago

I see what you mean. Maybe one option would be to still send the data as base64 and convert this to a blob array as described in another SO post (also answer 2 seems nice).

This still wouldn't be perfect efficiency-wise, but probably better than sending the text representation of an integer array.


(that being said, I'm fine if the limit is 'just' documented)

yihui commented 4 years ago

I just briefly tested this way of creating the blob url:

this.href = URL.createObjectURL(
  new Blob(Uint8Array.from(atob(X), c => c.charCodeAt(0)), {type: Y})
);

but it didn't seem to work. I'll leave it here and wait for other experts to help.

fmmattioni commented 2 years ago

I have just implemented into {downloadthis} this blob generation.

It goes like this:

https://github.com/fmmattioni/downloadthis/blob/master/R/utils.R#L44-L67

create_blob <- function(tmp_file, output_file) {
  ## get type of file
  type_file <- mime::guess_type(file = tmp_file)
  ## read bin
  bin_file <- readBin(tmp_file, "raw", 10e6)
  bin_file <- jsonlite::toJSON(bin_file, raw = "js")
  ## produce js function
  js_function <- glue::glue(
    "
    const myBlob = new Blob([{{bin_file}}], { type: '{{type_file}}' });
    const downloadURL = window.URL.createObjectURL(myBlob);
    const a = document.createElement('a');
    document.body.appendChild(a);
    a.href = downloadURL;
    a.download = '{{output_file}}';
    a.click();
    window.URL.revokeObjectURL(downloadURL);
    document.body.removeChild(a);
    ",
    .open = "{{",
    .close = "}}"
  )
  js_function
}

and then the link can be generated like so:

htmltools::a(
      onclick = create_blob(tmp_file = tmp_file, output_file = output_file)
    )
yihui commented 2 years ago

@fmmattioni Thanks! That is pretty much an implementation of https://github.com/yihui/xfun/issues/23#issuecomment-631108575. As I said, the text representation of the file (as a Uint8Array) is very inefficient. From my quick tests, the size of text representation is about 3 times larger than the original file. That is, for a 5Mb file, the embedded "file" size is about 20Mb.

fmmattioni commented 2 years ago

@yihui would you recommend getting the blob from the base64 string then?

fmmattioni commented 2 years ago

the implementation could be like the following:

get_data_uri <- function(tmp_file) {
  paste0(
    "data:",
    mime::guess_type(file = tmp_file),
    ";base64,",
    base64enc::base64encode(tmp_file)
  )
}

create_blob <- function(tmp_file, output_file) {
  base64 <- get_data_uri(tmp_file)
  ## produce js function
  js_function <- glue::glue(
    "
    fetch('{{base64}}').then(res => res.blob()).then(myBlob => {
      const downloadURL = window.URL.createObjectURL(myBlob);
      const a = document.createElement('a');
      document.body.appendChild(a);
      a.href = downloadURL;
      a.download = '{{output_file}}';
      a.click();
      window.URL.revokeObjectURL(downloadURL);
      document.body.removeChild(a);
    });
    ",
    .open = "{{",
    .close = "}}"
  )
  js_function
}
yihui commented 2 years ago

That looks better. Thanks for sharing!