gyselroth / balloon

High performance, feature rich document management system written for the cloud
GNU General Public License v3.0
18 stars 7 forks source link

Upload big file (~7GB) ends in 504 Gateway Timeout due slow md5 sum calculation #382

Closed raffis closed 4 years ago

raffis commented 4 years ago

Describe the bug

Uploading a big file chunked using the chunk api v2 endpoint (7~/GB) ends in error 504 but after some time the file does exist anyway. The checksum is correct as well.

To Reproduce

  1. Upload 7GB file via web-ui
  2. Requests ends in 502 Gatway Timeout

Expected behavior

Should end in 200 OK.

Environment

Additional context

The MongoDB filemd5 command to generate the checksum took me ~120s to finish db.runCommand({ filemd5: ObjectId("5d946dcecc69ec00074fcc8e"), root: "fs" }).

The problem is upstream server do not wait for this action as php-fpm times out. This would not be an issue in #338. The default fastcgi timeout for nginx is 60s. (http://nginx.org/en/docs/http/ngx_http_fastcgi_module.html#fastcgi_read_timeout)

raffis commented 4 years ago

I see only these solutions:

  1. Increase the default fastcgi_read_timeout to at least 5min. But then we still would end up with a 504 if an even bigger file gets uploaded despite chunking
  2. New checksum implementation, get a checksum for each chunk and create a checksum overall chunk checksums. This has several disadvantages, the chunk size must always be the same. Non balloon clients can't get the checksum easily like md5file.
  3. A nice solution could be to implement the md5 algorithm manually and build it somehow overall chunks and store the state somehow between requests.
raffis commented 4 years ago
3. A nice solution could be to implement the md5 algorithm manually and build it somehow overall chunks and store the state somehow between requests.

Sadly this is too slow in php. https://gist.github.com/raffis/3362374991ed1493abd5ebcc3d465cf0#file-php takes 17s with PHP7.4 (JIT) for a 8MB file.

raffis commented 4 years ago

Increasing the default read timeout from 60s to 300s for now (fastcgi_read_timeout 300;). Needs to be changed in the helm repository as well.

raffis commented 4 years ago

Thanks to https://github.com/gyselroth/php-serializable-md5 there will be a fix in balloon 2.7

raffis commented 4 years ago
  1. A nice solution could be to implement the md5 algorithm manually and build it somehow overall chunks and store the state somehow between requests.

I've written a php extension which exposes a serializable md5 hash context. See https://github.com/gyselroth/php-serializable-md5.