googleapis / nodejs-storage

Node.js client for Google Cloud Storage: unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving.
https://cloud.google.com/storage/
Apache License 2.0
898 stars 370 forks source link

Chunked Resumable uploads are atleast 5x times slower than normal upload #2151

Closed Chethan-sn closed 1 year ago

Chethan-sn commented 1 year ago

Hi Team,

I am trying to use chunked Resumable upload using node js SDK for large file uploads (> 10 GB). But compared to normal single request upload, the upload rate is hugely reduced (at least 5x times decrease in performance). I have tried different chunk sizes ranging from 5 MB to 32 MB. It appears to make no difference.

`const storage = new Storage(); const bucket = storage.bucket(''); const file = bucket.file('1 GB.txt'); const chunkSize = 32 1024 1024; const uploadChunk = () => {

const chunk = fs.createReadStream('file path'); // this could be stream from UI in prod const options = { resumable: true, uri, // pass it from createResumableUpload method called before this method // chunkSize: chunkSize // uncomment for chunked upload }; const writeStream = file.createWriteStream(options); chunk.pipe(writeStream); writeStream.on('error', (err) => { console.error('Error uploading chunk:', err); }) .on('finish', () => { console.log('Upload complete!'); console.log("end time --- ", new Date()); }).on('response', (resp) => { console.log("status code", resp.status); }); chunk.on('end', () => { writeStream.end(); }) `

Environment Details: node js - 16.17.1 Network Speed - 100 Mbps file size tested - 1 GB "@google-cloud/storage": "6.9.2" Approx. time-taken with single and chunked upload respectively - 2 mins & 10 mins

This was performed on the local machine with good internet speeds, there would be other latencies in production environments. The upload would take longer than 40 mins for a 1 GB file in slow network scenarios and it forces us to increase the request timeouts which is not desirable

Please let me know if there is any way to improve the performance of the chunked upload or is there an alternate solution for large file uploads with resumable option. Also, Please let me know if I'm missing anything or if more information is needed.

ddelgrosso1 commented 1 year ago

Hi @Chethan-sn when you specify a chunk size the library will make a new HTTP request for every chunk. In your example above it appears you are uploading a 1GB file utilizing chunk sizes between 5MB and 32MB. This would result in between 32 calls (1024 MB / 32 MB = 32) and 205 calls(1024 MB / 5 MB = 204.8). As the file size grows, so too would the number of calls. If a chunked upload is needed in your application, I would suggest tuning the chunk size to something appropriate for the environment.

Chethan-sn commented 1 year ago

Hi, @ddelgrosso1 Thanks for the clarification. Would like your insight on one approach.

We have our services set up as below.

Client(web browser) ----> our node server -------> GCS

And we would like to keep the connection time for one request between the Client and server to less than 20 mins. This created a problem for large files. Hence wanted to see if resumable chunked upload could help us here. Is it possible to stream as much as possible until timeout and create a new request after that using the previously created URI so that the next data is streamed from the client in subsequent calls until the whole file is uploaded? Or is it necessary to have the entire file on the server before starting to upload to GCS?

Please let me know if you have any suggestions on this.

ddelgrosso1 commented 1 year ago

Are you doing any kind of processing in Node to the file? If not you might look into using a signed URL and uploading the file directly from the browser. Otherwise if you are going to stream the file from browser to Node to GCS I would suggest not setting a chunk size and just utilize the default resumable upload options.

Chethan-sn commented 1 year ago

Thanks for the input. Yes, we are doing additional processing in node. I'll check not setting any chunk size

ddelgrosso1 commented 1 year ago

I'm going to close this out. If there are any other issues or questions, please feel free to reopen or create a new issue.

ddelgrosso1 commented 1 year ago

This has been addressed in v6.10.1 where performance improvements were made to resumable uploads with chunkSize specified.