mscdex / node-ftp

An FTP client module for node.js
MIT License
1.13k stars 244 forks source link

.Pipe's Flow Control fails on slow Writeable Stream #70

Open solarflare045 opened 10 years ago

solarflare045 commented 10 years ago

It appears the readable stream is automatically cleaned up when the library has finished receiving a file. This can be a problem when using .pipe when piped to a slower stream (such a a HTTP response.)

The best way to replicate this is to download a file over FTP, and pipe to an HTTP response where the HTTP connection is considerably slower than the FTP connection.

mscdex commented 10 years ago

I'm not sure I understand what you mean by "automatically cleaned up." The raw tcp socket is passed to the get() callback and the only time that socket is destroyed is on data connection timeout.

solarflare045 commented 10 years ago

Yeah, that appears to be fine. However, the Readable stream that the .get() callback provides is immediately destroyed when the stream from the FTP server to the library ends. This is noticeable as stream.destroyed equals True at this point.

var ftp = require('ftp');
var client = new ftp();
var fs = require('fs');

client.on('ready', function() {
    client.get('/www/favicon.ico', function(err, stream) {
        if (err) {
            console.log('ERROR!');
            return;
        }

        console.log('Stream Destroyed? ' + stream.destroyed);

        stream.once('close', function() {
            console.log('Stream Destroyed? ' + stream.destroyed);
            client.end();
        });

        stream.pipe(fs.createWriteStream('favicon.ico'));

        // Simulate flow-control
        setTimeout(function() {
            stream.pause();
            setTimeout(function() {
                stream.resume();
            }, 3000);
        }, 10);
    });
});

client.connect(/* Host, User and Password */);

I guess the expected flow would be that you would push last packet to the Readable stream, then the stream would emit the 'data' event, then you would clean up the stream. However, if the stream is paused at that point (say due to .pipe()'s own internal flow control), all the data after the pause is lost.

Edit: Just to clairfy. The above example has issues if the FTP stream finished (aka, the file had been fully received) in between the .pause() and the .resume().

mscdex commented 10 years ago

What node version are you using?

solarflare045 commented 10 years ago
node --version
v0.10.22
mscdex commented 10 years ago

Ok so if I understand correctly, the complaint is that when you end the control connection, the data connection is immediately severed as well? If that's the case, I don't think there is much I can do about that because that's the server doing that.

Perhaps you should client.end(); on the local writable file stream's 'finish' event instead of the source socket's 'close' event?

solarflare045 commented 10 years ago

I have tried not having that event at all ... it was still a problem.

What did solve my problem was using:

stream.on('data', function(chunk) {
    res.write(chunk);
});

Instead of:

stream.pipe(res);

(Where stream was from your libary, and res is a http.ServerResponse.)

And complaint is more when the data connection completes, the stream object you provided is severed. Meaning if the stream object was paused at that point in time, you lose the rest of the data. If it wasn't paused, there is no issue.

mscdex commented 10 years ago

I don't know then. That's a raw socket you're working with and node-ftp (as far as I can tell) isn't meddling with it in any way that would cause something like that. :-\

polidore commented 10 years ago

I have the same problem.

polidore commented 10 years ago

It happens most often on very small files over very fast networks. May be a node issue, but I can't use pipe with node-ftp because of it. Happens almost every time on fast production hardware.

cfurst commented 10 years ago

@polidore I think is right. I have the same issue. I'm connecting to a server and downloading a large number of files, (over 4,000). Besides the EMFILE errors that pop-up if I don't use fs-graceful, I notice that many of the files come down empty. My guess is that the pipe is not really pipping all the data. The conditions in my situation are similar to what @polidore described, small files over a fast network (and many of them). This is probably an underlying issue with node.

yoitsro commented 10 years ago

I've wasted a week of my life on this! This does definitely seem to be a Node issue relating to 0.10.x. I've tried to replicate the bug with Node 0.11.13 and it seems fine. I think it was fixed with this release of Node: http://blog.nodejs.org/2013/05/13/node-v0-11-2-unstable/.

Edit: Ignore me. Just retried it a few more times and the same issue is happening.

mbatista commented 9 years ago

I am not sure the problem I was having is the same you guys were. Thing is stream was being closed at some random point before being able to pipe it somewhere else. It was even more weird then this, since after the piping I would get an error. After a few tests I found out this stops happening if I process only one file at a time. I simply create a bunch of tasks for all files I want to process this time and call them using async.series. Hope this helps someone! =)

cfurst commented 9 years ago

doesn’t that mean that you are opening a new connection for each file? Have you found that to be a little slow??

Carl Furst

From: Marcel Ferreira Batista notifications@github.com<mailto:notifications@github.com> Reply-To: mscdex/node-ftp reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, December 23, 2014 at 2:13 PM To: mscdex/node-ftp node-ftp@noreply.github.com<mailto:node-ftp@noreply.github.com> Cc: Carl Furst Carl.Furst@mlb.com<mailto:Carl.Furst@mlb.com> Subject: Re: [node-ftp] .Pipe's Flow Control fails on slow Writeable Stream (#70)

I am not sure the problem I was having is the same you guys were. Thing is stream was being closed at some random point before being able to pipe it somewhere else. It was even more weird then this, since after the piping I would get an error. After a few tests I found out this stops happening if I process only one file at a time. I simply create a bunch of tasks for all files I want to process this time and call them using async.series. Hope this helps someone! =)

— Reply to this email directly or view it on GitHubhttps://github.com/mscdex/node-ftp/issues/70#issuecomment-67987397.


MLB.com: Where Baseball is Always On

mbatista commented 9 years ago

Sorry I don't think I understand. What I do is to list all the directories that interest me, and find the files that I need. I basically store all the paths to the files. When I am done with this, I create a task for each path and then process it in a serial way. Something like

var tasks = [];

filePaths.forEach(function(filePath){
    (function(filePath){
        var task = function(done){
            ftp.get(filePath, function(error, stream){

                stream.pipe(process.stdout);

                stream.on("close", function(){
                    done();
                })
            })
        }
        tasks.push(task);
    })(filePath);
});

async.series(tasks, function(){
    console.log("Done processing all files")
})

So yes, I make a GET for each path but using the same FTP client. Did I answer your question?

cfurst commented 9 years ago

Yes, thanks! Carl Furst

From: Marcel Ferreira Batista notifications@github.com<mailto:notifications@github.com> Reply-To: mscdex/node-ftp reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, December 23, 2014 at 2:26 PM To: mscdex/node-ftp node-ftp@noreply.github.com<mailto:node-ftp@noreply.github.com> Cc: Carl Furst Carl.Furst@mlb.com<mailto:Carl.Furst@mlb.com> Subject: Re: [node-ftp] .Pipe's Flow Control fails on slow Writeable Stream (#70)

Sorry I don't think I understand. What I do is to list all the directories that interest me, and find the files that I need. I basically store all the paths to the files. When I am done with this, I create a task for each path and then process it in a serial way. Something like

var tasks = [];

filePaths.forEach(function(filePath){ (function(filePath){ var task = function(done){ ftp.get(filePath, function(error, stream){

            stream.pipe(process.stdout);

            stream.on("close", function(){
                done();
            })
        })
    }
    tasks.push(task);
})();

});

async.series(tasks, function(){ console.log("Done processing all files") })

So yes, I make a GET for each path but using the same FTP client. Did I answer your question?

— Reply to this email directly or view it on GitHubhttps://github.com/mscdex/node-ftp/issues/70#issuecomment-67988673.


MLB.com: Where Baseball is Always On

dcolens commented 9 years ago

I see a similar behaviour : I call stream.pause() immediately after a get and still receive the "end" and "close" events for the stream without using pipe or on('data'). According to nodejs docs "the end event will not fire unless the data is completely consumed": https://nodejs.org/docs/v0.10.37/api/stream.html#stream_event_close

However with node-ftp, the stream sends "end" while no data was consumed yet.

looking at the code, resume() is called after the callback https://github.com/mscdex/node-ftp/blob/master/lib/connection.js#L618, so my pause is cancelled because it happens before resume() is called in the node-ftp lib. That's very confusing, a way around this is to call to pause() within a setTimeout. I wonder why resume() is called after the callback at https://github.com/mscdex/node-ftp/blob/master/lib/connection.js#L618 ?

Calling pause with setTimeout allows to pause the stream, but the end and close event still arrive before any data is consumed.

The end event issue is due to node-ftp, if I comment lines: https://github.com/mscdex/node-ftp/blob/master/lib/connection.js#L555-L569 It behaves as expected. With that code, the end event is never sent because of the return statement outside of the if block. As a result the stream consumers never receive end and hang.

gesior commented 9 years ago

NW 0.12.3 It looks like related bug.

readStreamFromFTP.pipe(decryptor).pipe(writeStreamToHDD);

Error: error:0606506D:digital envelope routines:EVP_DecryptFinal_ex:wrong final block length

Read stream finish and close 'decryptor' (lib crypto, AES decryptor), before it read all data from FTP, so it call 'decryptor.final()' on 'wrong length data' and throw Error.

AFTER I PUT 'decryptor' IN 'data' AND 'close' EVENTS OF FTP STREAM IT WORKED FINE. Bug occurs on fast machines with files size over 1 MB. I ran it in loop to get error after 1-25 transfers (1MB file).

cfurst commented 8 years ago

Maybe a queue layer using async?? put the files on a queue (like FileZilla) and throttle it. I mean file I/O is so damn fast, forcing node to be a bit more synchronous wouldn't hurt.

dustinbolton commented 8 years ago

Perhaps related to https://github.com/mscdex/node-ftp/issues/66 ? There's a workaround there which may help perhaps?

tomerbaumel commented 7 years ago

What worked for me is the:

ftpClient.get("fileNameOnFtp", function (err, stream) {
        if (!!err) {
            var errMessage = 'downloading the file:  "' + metadata.filename + '" is failed. Error message:'+ err.toString();
            error(errMessage);

        } else {
            var localDownloadedFileName = "localFileName";
            var writeStream = fs.createWriteStream(localDownloadedFileName);
            stream.on('data', function(chunk) {
                writeStream.write(chunk);
            });
            stream.on('end', function () {
                writeStream.close();
                callback();
            });
        }
    });
Spaarw commented 4 years ago

Same problem for me, anyone solved ?