microbiomedata / nmdc-edge

Web-based interface to the NMDC EDGE platform
https://nmdc-edge.org
3 stars 0 forks source link

Cannot upload files larger than 500 Mebibytes via web UI (Jetstream2) #218

Open mflynn-lanl opened 2 days ago

mflynn-lanl commented 2 days ago

File uploads worked for 43Mb and 380MB, but failed for 1.38BG

mflynn-lanl commented 2 days ago

Tried to upload 1.38GB file

Image

eecavanna commented 1 day ago

I'm curious what the following show when that error occurs:

mflynn-lanl commented 1 day ago

From the javascript console:

https://edge-dev.microbiomedata.org/auth-api/user/upload/add
net::ERR_CONNECTION_CLOSED
Promise.then (async)        
(anonymous) @   util.js:32
u   @   util.js:30
(anonymous) @   Uploadfiles.js:108
c   @   Uploadfiles.js:100
onSubmit    @   Uploadfiles.js:98

There isn't anything in the log files related to this

eecavanna commented 1 day ago

Thanks for sharing the console snippet and note about the logs.

I checked the logs early Tuesday morning (PT) and here are the lines I saw with that path in them:

// $ docker compose logs app | grep '/auth-api/user/upload/add'

exouser-app-1  | 2024-06-30 18:40:39 debug:     /auth-api/user/upload/add: {"name":"Ecoli_10x-int.fastq.gz","type":"fastq.gz","size":"43000093"}
exouser-app-1  | 2024-06-30 19:51:27 debug:     /auth-api/user/upload/add: {"name":"SRR7877884-int-0.1.fastq.gz","type":"fastq.gz","size":"385094750"}
exouser-app-1  | 2024-07-01 15:10:41 debug:     /auth-api/user/upload/add: {"name":"SRR7877884-int-0.1.fastq.gz","type":"fastq.gz","size":"385094750"}
exouser-app-1  | 2024-07-01 23:08:35 debug:     /auth-api/user/upload/add: {"name":"Ecoli_10x-int.fastq.gz","type":"fastq.gz","size":"43000093"}

One more thing I'm interested in seeing is the "Network" tab of the web browser's DevTools, when the upload is happening. I want to see what stage the request gets to.

eecavanna commented 1 day ago

I see that the default value of the config.FILE_UPLOADS.MAX_FILE_SIZE_BYTES config variable is 10 GB, and we are not overriding it anywhere; so, I don't think it's an issue of that config variable's value being too low.

https://github.com/microbiomedata/nmdc-edge/blob/7bcee5bff833575c6a84eefc6132b7efd4c11c4e/webapp/server/config.js#L148

eecavanna commented 20 hours ago

I reproduced the symptom with a 1 GB file.

I created the file by running (on my laptop):

head -c 1000000000 /dev/urandom > my-fake-1-gigabyte.fastq.gz

When I tried uploading it to the web app, the upload failed after 2 minutes.

image

Note: I confirmed that I could successfully upload a 1 MB file (also generated from /dev/urandom). I may try some files at intermediate sizes (e.g. 750 MB).

eecavanna commented 19 hours ago

I see this post in the Caddy community forum. Note that this is not in response to anything I've posted—it's an existing forum thread from 2023.

image

See also: https://caddyserver.com/docs/caddyfile/options#timeouts

eecavanna commented 19 hours ago

Based upon recent experiments (not shown above), I think the limit is 500 Mebibytes (MEga-BInary-Bytes).

I'll test that next.

$ head -c 523239424 /dev/urandom > my-fake-499-mebibyte.fastq.gz
$ head -c 524288000 /dev/urandom > my-fake-500-mebibyte.fastq.gz
$ head -c 525336576 /dev/urandom > my-fake-501-mebibyte.fastq.gz
eecavanna commented 19 hours ago

I confirmed I could upload a 499 Mebibyte file. It took 3.5 minutes.

image

eecavanna commented 18 hours ago

Uploading a 501 Mebibyte file fails in 2.1 minutes.

image

eecavanna commented 18 hours ago

@mflynn-lanl, the max upload size seems to me to be 500 Mebibytes (which is 524,288,000 Bytes). I consider that to be a "clue" as to what is imposing the limit. I'll keep an eye out for that value as I continue looking for a culprit.

eecavanna commented 17 hours ago

I just learned Cloudflare imposes limits on the size of the request body. I think we are on an Enterprise Cloudflare plan, which has — by default — a "500 MB" limit (as documented here). Assuming "MB", there, means Mebibytes, that is consistent with what I have observed (see previous comments).

eecavanna commented 17 hours ago

We could turn off Cloudflare's "Proxy" feature for this edge-dev subdomain so that the HTTP request bypasses Cloudflare, but we'd lose out on the benefits of having it turned on; e.g. automatic SSL certificate provisioning and renewal, protection against certain types of network attacks.

eecavanna commented 19 minutes ago

I turned off Cloudflare's "Proxy" feature for the edge-dev subdomain. Then, I confirmed I could upload a 501 Mebibyte file successfully (previously, uploading the same file failed).

image