realtymaps / promise-sftp

a promise-based sftp client for node.js
MIT License
16 stars 11 forks source link

Bug fix in put #2

Closed kriswest closed 8 years ago

kriswest commented 8 years ago

If using a readableStream for a put operation, it would fail with:

Tue, 28 Jun 2016 12:20:13 GMT ringr:enterpriseApiCtrl   name:         Error
    message:      Cannot pipe. Not readable.
    stack:        Error: Cannot pipe. Not readable.
    at WriteStream.Writable.pipe (_stream_writable.js:161:22)
    at /data/ringr-api/node_modules/promise-sftp/dist/promiseSftp.js:368:20

This was due to trying to pipe the writableStream into the readableStream:

stream.pipe(input);

reversed that and its fine:

input.pipe(stream)
zacronos commented 8 years ago

Oops, thanks!

kriswest commented 8 years ago

Still not working unfortunately - we'll look at it again today. Passing in a stream results in a zero byte upload, where using a local file works fine. Same stream works fine in promise-ftp...

kriswest commented 8 years ago

@zacronos We had a deeper look at the put command and roped in a colleague @dtwright.

Dan Wright: the most progress I've made with FTP is that promise-sftp uses a totally different upload method in sftp.js, depending on whether the input is a string (local file) or a stream so my guess is that the problem lies in there somewhere – not at all sure where

The same stream passed into promise-ftp and on to (node-)ftp definitely works (for reference we're grabbing something from S3 with getObject, turning the response into a stream and passing to FTP, preferably avoiding temp file management by using a node pipe). However, it closes out immediately in promise-sftp and ends up storing a 0 byte file on the sftp server.

We noted that ftp calls pause on the streams it is passed, and then resume after piping them. We added that just in case but it didn't help.

If you have any ideas on where to look to solve that we'd love the help - we believe it is essentially broken for ReadableStreams right now, but does appear to work with Buffers. We haven't test string inputs.

zacronos commented 8 years ago

Try changing the "ssh2": "git://github.com/realtymaps/ssh2.git" dependency to "ssh2": "0.5.x" and see if that helps. I made a fork of ssh2 to work around a disconnect bug, but it was fixed a few months ago. Maybe work on ssh2 since then has fixed the bug you are encountering. (That's where the sftp.js file comes from.)

kriswest commented 8 years ago

@zacronos No joy sadly. I wonder if its related to these two issues reports on ssh2...

zacronos commented 8 years ago

At this point I can't rule that out -- but it would be disappointing, since that would make it much harder to fix.

Do you also see the problem if you try to put a local file by doing promiseSftpClient.put(fs.createReadStream(localFileName), destination) instead of promiseSftpClient.put(localFileName, destination)?

Another thing to try is adding an error handler on your S3 stream before passing it in to promiseSftpClient.put(). If there is an error occurring there, then any promiseSftp methods that receive streams might need to have some stream-error-catching logic adding into continueLogicFactory().

kriswest commented 8 years ago

Thanks for the pointer on stream error handling.

Good idea on fs.createReadStream(localFile), that does duplicate the issue (put returns too quickly for my 100mb test file and actually uploads 0 bytes), where passing in the path to the local file works fine.

zacronos commented 8 years ago

Ok, so that tells us it isn't a timing issue, and probably not related to a stream error coming from S3 -- if that were the case, I would expect the problem to go away when using fs.createReadStream(localFile).

Hmm, the way you worded that ("put returns too quickly...") makes me think of something else. The way this module currently works, pretty much any method will return immediately. The delay waiting for a prior command to finish actually takes place when the next command is initiated. This was intentional, so that code which looks like:

promiseSftpClient.put(localFileName, destination)
.then () ->
  doSomethingThatMightTakeTime()
.then () ->
  promiseSftpClient.somethingElse(...)

would be able to work in efficient parallel fashion -- the upload from the put will take place while doSomethingThatMightTakeTime() executes, and there won't be a delay until we try to do something else with the SFTP client (possibly shortening or altogether eliminating the delay).

There are a few issues though, which I never realized until now:

To test if this is the cause of your problem, I suggest putting another command after your .put() (such as .list()), before you logout or do anything that checks the result of the upload. If that seems to fix the issue, then we've solved the mystery.

I think to handle all of this ideally, I would want to:

  1. fix .end() and .logout() to wait for prior commands to finish
  2. add a config option to toggle between wait-to-delay behavior and delay-immediately behavior
  3. add documentation about both the wait-to-delay behavior (and explanation for why it is useful), as well as the delay-immediately behavior
  4. add delayed stream-error reporting when using wait-to-delay, and immediate stream-error reporting when using delay-immediately
  5. add a new .wait() method, which does nothing but wait for command completion when using wait-to-delay, and is a no-op wen using delay-immediately
  6. possibly make delay-immediately the default behavior, in which case I would need to bump the version number to 0.10.x

If this is indeed the problem you're facing, then probably either 1) or 5) would be sufficient to get you moving again.

kriswest commented 8 years ago

I think you're definitely onto something. By adding a list() call before .end() the first 256kb of the file gets uploaded before end() runs and cuts it off.

ftp.connect(options)
.then(function (serverMessage) {
    debug("upload_from_local_file: Server message:\n" + serverMessage);
    return ftp.put(local_path_or_stream, upload_path);
}).then(function () {
    return ftp.list();
}).then(function (objArr) {
    debug("upload_promise: ftp server listing: " + JSON.stringify(objArr));
    return ftp.end();
}).then(function () { //...more app logic

Hence, 1 or 5 would very likely get us unstuck. There is a usecase for .end() to cut the connection immediately, but you might use an optional argument for that and have the default behaviour wait and return when all ops are complete (I assume thats the default behaviour in promise-ftp - having them match would be a great help). Adding 5 ( wait() ) when people really need to breakout of wait-to-delay but not disconnect from the server would no doubt be very helpful.

Thank you for your help and insight on this so far, its very much appreciated!

zacronos commented 8 years ago

The case where someone wants to cut the connection immediately is handled by .destroy(), so .end()/.logout() should definitely wait.

Hmm, I expected using .list() either to completely fix the problem, or else do nothing. This points to perhaps another issue, on top of the list I made earlier.

Looking at the SFTPStream docs and then back at my code, it seems that I need to add some additional logic so that the stream-based methods watch for a 'finish' event before they are considered complete. That will be needed in addition to 1) or 5). I'll see if I can find some time tonight to work on this.

zacronos commented 8 years ago

Ok, I've done some work on a put_fix branch -- it handles 1), 5), and the 'finish' logic.

Unfortunately, I don't have a writable SFTP endpoint to use for testing (which is the reason .put() was so buggy). Could you try with that branch and see if that fixes it for you? I don't expect you'll need to use the wait() function, but it's there anyway. In your package.json, you can set your dependency version to "promise-sftp": "git://github.com/realtymaps/promise-sftp#put_fix" to use the branch.

Once we're happy with it, I'll document and merge the fix into the master branch.

kriswest commented 8 years ago

Excellent, end() is definitely waiting appropriately now. Both the upload from a local file (via a stream) and streamed from the S3 response uploaded in full.

However, the uploads were extremely slow for some reason. I've taken a few measurements (from promise-ftp too for comparison):

filesize: 90,478,474 (87mb) Uploading from AWS US-EAST-1 to EU-WEST-1

Using the local steam is around 8x slower than pointing the local file, while the S3 stream is 61x slower... Again I don't think its the stream's fault due to the comparison with promise-ftp. Any ideas?

I'll check with the team, we should be able to give you access to our ftp/ftps/sftp endpoints which we setup and secured just for testing use. They are behind a white list tho so I'll need a (pref. static) IP.

Again, thx for your help on this! We'll see what we can do about polishing up wrapper code to share. We're only really using put at the moment (exporting files processed by our API) but could expose the common functions from both libs and provide access to the underlying objects for the library specific functions.

K

P.S. checked that it installed and used ssh2 0.5.0 so ditched our local shrinkwrap that forced that.

On 1 July 2016 at 06:37, Joe Ibershoff notifications@github.com wrote:

Ok, I've done some work on a put_fix branch -- it handles 1), 5), and the 'finish' logic.

Unfortunately, I don't have a writable SFTP endpoint to use for testing (which is the reason .put() was so buggy). Could you try with that branch and see if that fixes it for you? I don't expect you'll need to use the wait() function, but it's there anyway. In your package.json, you can set your dependency version to "promise-sftp": "git:// github.com/realtymaps/promise-sftp#put_fix" to use the branch.

Once we're happy with it, I'll document and merge the fix into the master branch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/realtymaps/promise-sftp/pull/2#issuecomment-229858045, or mute the thread https://github.com/notifications/unsubscribe/ABn3hNiUrpl5MADqEaZO2LdmC6CjQfXPks5qRKejgaJpZM4JADyH .

Regards,

Dr Kris West

Tel (UK): +44 (0)203 289 4553 Support hotline: +44 (0)203 397 4093

Skype: kriswest

zacronos commented 8 years ago

Well that's certainly unexpected! I would expect the promise-sftp local file method to be the fastest of all the methods, since it uses .fastPut() under the hood to upload via multiple concurrent TCP streams.

I think the next step will be to try promiseSftpClient.fastPut() directly and see how it does in comparison. I have a couple ideas for what to do after that, but let's make sure promiseSftpClient.fastPut() is as fast as I expect it to be first.

Great, access to an SFTP testing endpoint would make this a lot easier. Please contact me at the email address in the package.json file, and we can deal with static ip and creds.

Any wrapper contributions you can make would be great! A rising tide lifts all boats, as they say; the more everyone shares their non-mission-critical code, the better the world is.

kriswest commented 8 years ago

nope fastPut's no different:

promise-ftp (ftp) upload_duration: 7.781 secs promise-sftp upload_duration: 15.193 secs

I assume Dan's been in touch with access details for that server - shout if you have any trouble with it.

Will do what I can on the wrapper, it'll be a week or so before I have time to tidy it up and package it. Am fairly sure I'll use it again even if no-one else does ;-)

mscdex commented 8 years ago

@kriswest Interesting performance numbers. Out of curiosity, what version of node are you using?

kriswest commented 8 years ago

@mscdex We're running v4.4.7