pkg / sftp

SFTP support for the go.crypto/ssh package
BSD 2-Clause "Simplified" License
1.52k stars 380 forks source link

file uploads are very slow compared to java jsch library #472

Closed karthikrap closed 2 years ago

karthikrap commented 3 years ago

Upload of 50mb file using sftp module took 237s while java jsch takes only 24s. My observation is most of the time is going in writing data to remote file though stream. I tried different ways of writing to stream (which i kept in comments in main.go) but none improved performance.

Attaching Java and Go Samples with source code (src.zip) and binaries(bin.zip). Instructions on how to execute these binaries is documented in README.txt in bin.zip. bin.zip src.zip

drakkan commented 3 years ago

Hi,

please try enabling ConcurrentWrites

karthikrap commented 3 years ago

I tried upload with "ConcurrentWrites" now upload time is almost same as java jsch. I have few questions

  1. If i use "ConcurrentWrites", Upload time are not consistent like it is in java jsch. On average 50mb upload taking 24s but some times it takes 60s. while java jsch takes at max 30 sec.

  2. In doc, for "ConcurrentWrites" it says "A write to a later offset in a file after an error, could end up with a file length longer than what was successfully written.". If i force terminate upload, partially uploaded file has few KBs of corrupted data at the end. Does this mean i cannot use it to implement resumable uploads? In what cases i should not use "ConcurrentWrites"?

  3. Java jsch library writes data sequentially which i saw in their source code. In go's ssh module, are reads & writes to ssh stream slower or why exactly is writing to stream without "ConcurrentWrites" is slower?

drakkan commented 3 years ago

I tried upload with "ConcurrentWrites" now upload time is almost same as java jsch. I have few questions

  1. If i use "ConcurrentWrites", Upload time are not consistent like it is in java jsch. On average 50mb upload taking 24s but some times it takes 60s. while java jsch takes at max 30 sec.

it could be related to the concurrency, for example take a look here, try to understand in your use case which concurrency value is used, you can change this value sending buffers of different sizes

  1. In doc, for "ConcurrentWrites" it says "A write to a later offset in a file after an error, could end up with a file length longer than what was successfully written.". If i force terminate upload, partially uploaded file has few KBs of corrupted data at the end. Does this mean i cannot use it to implement resumable uploads? In what cases i should not use "ConcurrentWrites"?

Concurrent writes are not safe if you want to implement resumable uploads, if you application crashes and so you cannot truncate the file to the uploaded size a successive resume will produce an inconsistent file

  1. Java jsch library writes data sequentially which i saw in their source code. In go's ssh module, are reads & writes to ssh stream slower or why exactly is writing to stream without "ConcurrentWrites" is slower?

Concurrent writes help on high latency networks (on local networks there is no difference). I haven't looked at java jsch code but they should use some form of concurrency to achieve that performance

karthikrap commented 3 years ago

@drakkan Thanks for your quick response it helped me a lot. One thing is java jsch uploads are sequential. If i forcefully terminate upload, partially uploaded file is not corrupted.

puellanivis commented 3 years ago

Your jsch is almost certainly not doing only sequential uploads, it is doing concurrent uploads. (This is evident by the speed of the upload.) To get the speeds you’re seeing, it must be transferring data in parallel ahead of the actual request.

There are some new considerations of how to process sftp requests that could make ConcurrentWrites more safe, but this would be for a future implementation of the client.

We took a highly safe approach here on concurrency that ensured safety over potential issues.

karthikrap commented 3 years ago

Thanks @puellanivis, if concurrentWrites does not corrupt file when uploads are interrupted it will be helpful.

puellanivis commented 3 years ago

Unfortunately, we’re not able to ensure that concurrentWrites would not corrupt the file in an error situation. I think a few changes are being made (sequential request ordering) that would make it safer, but I’d have to look at things afresh.

drakkan commented 2 years ago

Thanks @puellanivis, if concurrentWrites does not corrupt file when uploads are interrupted it will be helpful.

This was fixed in #482, please reopen if it still does not work as expected