theophilusx / ssh2-sftp-client

a client for SSH2 SFTP
Apache License 2.0
803 stars 196 forks source link

client put ends in dangling promise #546

Closed v-jung closed 1 month ago

v-jung commented 1 month ago

Hi, First of all, thanks for maintaining the lib and putting the effort into it.

This might be the same issue as as for 542 but as it is another method I think it makes sense to open a new issue.

I don't own the server and I can't tell you if the server is following the correct protocol and I also don't have access to the server logs. But the root cause is most probably one of the mentioned ideas here: link

Now, what I see is the following:

 // init client with debugging

      await client.connect({
            debug: (s) => {
                console.log(`${debugCount} client-debug: ${s}`);
                debugCount++;
            },
            host: server[0],
            password: credentials[1],
            port: parseInt(server[1]),
            username: credentials[0],
        });

       // setup readable stream
       const mediafile: NodeJS.ReadableStream;
       // load a file into the readable stream 
       // ....

       // put will not resolve nor reject if the data is big enough and the upload takes some time.. then the 
       // server seem to do some shenanigans
        try {
            await client.put(mediafile, sftpPathMediafile);
            console.log("done");
        } catch (e) {
            console.log(e);
        }

Logs after some time:

15105 client-debug: SFTP: Inbound: Received STATUS (id:3761, 0, "")
15106 client-debug: Outbound: Sending CHANNEL_DATA (r:971392, 16447)
15107 client-debug: SFTP: Outbound: Sent WRITE (id:3762)
15108 client-debug: Inbound: CHANNEL_DATA (r:0, 26)
15109 client-debug: SFTP: Inbound: Received STATUS (id:3762, 0, "")
15110 client-debug: Socket ended
15111 client-debug: CLIENT[sftp]: _put Unexpected end event
15112 client-debug: CLIENT[sftp]: Global end event: Handling unexpected event
15113 client-debug: Socket closed
15114 client-debug: CLIENT[sftp]: _put closeListener - handled close event
15115 client-debug: CLIENT[sftp]: Global close event: Handling unexpected event

And that's it. Then the service is stuck forever.

So what we see is:

So even if we see some kind of unexpected issue here, the library doesn't reject. If this is by design I would like to question it.

v-jung commented 1 month ago

Here are some Ideas what the reason could be and how to solve it: Please have a look at the _put method:


 _put(lPath, rPath, opts, addListeners = true) {
    let listeners, wtr, rdr;
    return new Promise((resolve, reject) => {
      if (addListeners) {
        listeners = addTempListeners(this, '_put', reject);
      }
      opts = {
        readStreamOptions: { ...opts?.readStreamOptions, autoClose: true },
        writeStreamOptions: { ...opts?.writeStreamOptions, autoClose: true },
        pipeOptions: { ...opts?.pipeOptions, end: true },
      };
      if (haveConnection(this, '_put', reject)) {
        wtr = this.sftp.createWriteStream(rPath, opts.writeStreamOptions);

In the described issue I would expect this function to reject if the ssh library got an

This function here, however only rejects if:

The wtr writeStream emits an error event OR if one of the listeners in listeners = addTempListeners(this, '_put', reject); calls the reject callback.

The addTempListeners only call the reject callback if they receive an "error" event. But what we get here is an end event. This only causes the logging of:

So what we have is: the sftp client is basically in an "ended" state. But the writeStream is still not errored or closed and hereby doesn't reject the promise.

The easy solution would be to also reject on "end" in this case. But I fear then we still have a memory leak with an unclosed writeStream.

I guess what could solve the issue is to call: wtr.destroy(new Error("unexpected end detected on ssh socket")) if we detect an unexpected end.

This again should trigger an error on the wtr Writestream which in turn should cleanup the stream itself and trigger the reject on the promise.

Please let me know you thoughts and if I can somehow help.

theophilusx commented 1 month ago

Unfortunately the issue is a little more subtle and difficult to solve than it seems on the surface. I would agree it is the server you are using which is the problem. However, I'm not sure there is a clean way of handling this within the module which doesn't cause issues for users who are using a compliant well behaving server. It should be noted that initially, the default end and close listener handlers DID reject/throw, but that caused problems for users using compliant servers where errors were reported for normal operation. It was why the handler was changed.

Important points to note -

      * You cannot rely on event order. An end event can be seen by
        client code before an error event.
      * Only the error event contains any error cause information
      * You can only reject/resolve a promise once
      * Both end and close events are 'normal' events and not exceptions. 

The key thing to note is that end and close events are not exceptions. These are normal information events and treating them as exceptions/errors will generate false positives during normal operation. To make it even more complicated, you cannot guarantee what order your code will respond to events. It is possible for an end listener to respond before an error listener. Problem with this is that only the error listener has any error cause information. If we reject a promise based on an end event, we cannot provide any details on why the promise was rejected. This leaves the client scratching their head wondering why their put(), get() etc failed.

In a nutshell what we have here is a server which is closing a connection without issuing any error or reason for the closure. The problem then becomes, how do we distinguish between a 'normal' end/close event and one which is an exception. At this point, things begin to get messy and we start to rely on heuristics, which are sometimes incorrect. Things become worse once you also factor in you cannot rely on event order in your heuristics.

I'm still not satisfied with the code handling all of this. The best I can say about it is that it works well when you are using a well behaved compliant sftp server. I would like it to work wsell with non-compliant servers, but not at the expense of performance or reliability on compliant servers. There are some changes I'm currently evaluating, but by definition, it is difficult to test against non-compliant servers because by their definition, being non-compliant means you cannot rely on their behaviour, limiting testing confidence.

I will up0date this issue once I have pushed some relevant changes. However, this won't be for a while as I have a lot of other work on my plate right now and need to prioritise income generating activities.

theophilusx commented 1 month ago

Unfortunately, as you will probably realise from my other message, it is far more complex than it appears on the surface and your proposed solutiolns won't suffice (I initially started with similar naive solutions. You can go back and look at earlier versions to see how things have evolved).

Another complication I forgot to mention is that you also have a timing issue when trying to make an essentially event driven API have a promise based wrapper. The problem is you have no control over when events are raised or received. For example, you might have a error, end or close event raised outside of a call to one of the methods such as put(), get() etc when no promises are active or you could have the an event raised just after the code in a try block completes. Of course, all of this gets eben more complicated because you do want a promise to be rejected if an error event occurs during the fulfilment of a promise, so the event listeners need to be linked with the specific promise i.e. the specific accept/reject functions of a particular promise but you also need to limit hows many listeners are in place at any time to prevent memory leaks or the too many listeners warning. This means you have to actively manage adding/removing of listeners. Unfortunately, this also introduces a race condition as it is possible an event can occur in the gap between when you remove one set of listeners and add the next set. Also, as there is no guarantee of when an event listener fires, you might remove listeners just before or during an event arrival.

The reality is, wrapping an event based API with promises is error prone and for really reliable results, you really should just use the event based API. Problem is, few JS programmers really understand event driven models and much prefer a promise/async callback model. I find it somewhat amusing I'm now seeing blog posts about the 'new' event driven model, which was the dominate model when I started programming in the 80s!

devgk882 commented 1 month ago

@theophilusx did you mean we should use event based implementation like this ?

const fs = require('fs');
const stream = fs.createWriteStream(localPath);

sftp.get(path)
  .on('data', (chunk) => {
    stream.write(chunk);
  })
  .on('end', () => {
    console.log('Download complete');
    stream.end();
  })
  .on('error', (err) => {
    console.error('Error during SFTP download:', err);
    stream.close();
  });
theophilusx commented 1 month ago

No, I was trying to explain some of the limitations associated with trying to wrap an event based API inside a promise based one. You should study the node documentation in the 'Events' section of the node documentation - especialloy the section on Asynchronous v synchronous and Error events.

The real problem here is that your sftp server is non-compliant and is closing the connection without issuing any error. However, it is a little weird it just hangs for you. I get a socket timeout error (eventually, it takes a while).

Anyway, I've made some changes which should help. However, they are breaking changes which will require a new version to be pushed out and that will take some time as the documentatioln will need to be updated as well as all the examples.

There are three main changes -

These changes are already implemented in the head of the repo, so feel free to try them out. I will probably push a new version 11.0.0 sometime in August.

v-jung commented 1 month ago

Thank you for all the detailed explanations. A very interesting read. And for me it shines a new light on wrapping events in promises.

I also have some new findings on my end with the external sftp server. The sftp server has a pretty short timeout. Meaning it sends an end event a short time after it detects that no data is transferred.

That no data is transferred is somehow a read-write stream issue (still not sure why, still investigating). But at least it explains what's going on. The read stream stops reading - or the sftp writestream stops writing(for whatever reason), and no data is transferred for some time. The sftp server detects that the ssh connection is pretty much unused and after a short timeout it sends an end event.

ssh is closed, Readstream and writestream are not in a final state and dangle forever. Then the sftp lib promise is dangling as well as it waits for the writestream.

For me the workaround right now is to read everything into a buffer first (using the very same readstream). And then I pass the buffer to the sftp librarys put method - works like charm as the data sizes I'm working with are not too big. Still I would prefer to directly pass the stream.

I will definitely check out your changes. Again, thanks a lot!

theophilusx commented 1 month ago

The changes in what will become v11 should address the issue you are having. The remote sftp server should be sending a timeout error event and then ending the connection. It isn't sending the error, which causes the code to hang. I have changed how event listeners work in v11 so they will do a reject if an end or close event is unexpectedly seen. There are some minor consequences for this, but not as bad as just hanging. The minor consequence is that in some situations, client code will know the connection has been ended, but the reason (error info ) will be 'lost', leaving you scratching your head as to the reason.

theophilusx commented 1 month ago

As the new version changes how both promises and global event listeners operate, this issue is both no longer current and beleived to be addressed, so closing this issue.