mscdex / ssh2

SSH2 client and server modules written in pure JavaScript for node.js
MIT License
5.52k stars 665 forks source link

fastGet Observations #1166

Open amitkirdatt opened 2 years ago

amitkirdatt commented 2 years ago

Node version: v14.19.1 OS: MacOS 12.1 ssh2-sftp-client: 8.0.0

Issue: We were seeing files being corrupted when using fastGet. Resolution: We needed to adjust the chunkSize specific to the sftp server we are connecting to

We started out with ssh2-sftp-client but to narrow the problem down we tried ssh2 as well. Originally posted here, but also posting in case it is useful to others

We ended up solving the problem - Just sharing our experience:

Code using ssh2-sftp-client

const Client = require('ssh2-sftp-client');

  const config = {
    host: 'xxx.xxxx.xxxx',
    username: 'xxxxxxx',
    password: 'xxxx'
  };

  const remotePath = '/remote-dir/file.txt';
  const localPath = './file.txt';

  const sftp = new Client();
  await sftp.connect(config);

  await sftp.fastGet(remotePath, localPath);
  await sftp.end();

The text file we are downloading is about ~86MB. We noticed that the file was consistently corrupted (data is missing).

To narrow down the issue we decided to whip up a quick test using ssh2

const { Client } = require('ssh2');

  const config = {
    host: 'xxx.xxxx.xxxx',
    username: 'xxxxxxx',
    password: 'xxxx'
  };

  const remotePath = '/remote-dir/file.txt';
  const localPath = './file.txt';

  const conn = new Client();
  conn.on('ready', () => {
    console.log('Client :: ready');
    conn.sftp((err, sftp) => {
      if (err) throw err;
      sftp.fastGet(remotePath, localPath, (err) => {
        if (err) throw err;
        console.log('Done!');
        conn.end();
      });
    });
  }).connect(config);

We noticed the same corruption.

We then fired up the native sftp client in debug mode and noticed that the server rmax value was 32768 (same as the default chunkSize)

sftp -vv username@xxxx.xxxx.xxx
debug2: channel_input_open_confirmation: channel 0: callback done
debug2: channel 0: open confirm rwindow 1073741824 rmax 32768
debug2: channel_input_status_confirm: type 99 id 0
debug2: subsystem request accepted on channel 0
debug2: Remote version: 3
debug2: Unrecognised server extension "versions"
debug2: Server supports extension "fsync@openssh.com" revision 1
debug2: Server supports extension "posix-rename@openssh.com" revision 1
debug2: Server supports extension "statvfs@openssh.com" revision 2
debug2: Server supports extension "fstatvfs@openssh.com" revision 2
debug2: Server supports extension "hardlink@openssh.com" revision 1

We then added debug logging to ssh2

debug: log => console.log(log)

In the debug we noticed the following:

Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:5440, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:5441, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:5442, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:5443, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5444, 816)
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5445, 816)
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5446, 816)
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5447, 816)
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5448, 816)
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5449, 816)
Inbound: CHANNEL_DATA (r:0, 829)
SFTP: Inbound: Received DATA (id:5450, 816)
Inbound: CHANNEL_DATA (r:0, 829)

It looked liked the DATA was being broken up into 2 parts: 1 part was 31952 and the 2nd part was 816, which when added up is 32768. We figured this was causing the issue.

We then specified the chunkSize of 31952 for fastGet. We didn't see the DATA being broken up into two parts like before (31952 and 816).

Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:2768, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:2769, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:2770, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:2771, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)
SFTP: Outbound: Buffered READ
Inbound: CHANNEL_DATA (r:0, 31965)
SFTP: Inbound: Received DATA (id:2772, 31952)
Outbound: Sending CHANNEL_DATA (r:0, 41)

The files are now downloading correctly.

I am sharing our experience in the hopes that somebody finds it useful.

ssh2-sftp-client updated code that worked for us:

const Client = require('ssh2-sftp-client');

  const config = {
    host: 'xxx.xxxx.xxxx',
    username: 'xxxxxxx',
    password: 'xxxx'
  };

  const remotePath = '/remote-dir/file.txt';
  const localPath = './file.txt';

  const sftp = new Client();
  await sftp.connect(config);

  await sftp.fastGet(remotePath, localPath, {chunkSize: 31952});
  await sftp.end();

ssh2 code:

const { Client } = require('ssh2');

  const config = {
    host: 'xxx.xxxx.xxxx',
    username: 'xxxxxxx',
    password: 'xxxx'
  };

  const remotePath = '/remote-dir/file.txt';
  const localPath = './file.txt';

  const conn = new Client();
  conn.on('ready', () => {
    console.log('Client :: ready');
    conn.sftp((err, sftp) => {
      if (err) throw err;
      sftp.fastGet(remotePath, localPath,  {chunkSize: 31952}, (err) => {
        if (err) throw err;
        console.log('Done!');
        conn.end();
      });
    });
  }).connect(config);
mscdex commented 2 years ago

I can't reproduce the problem against an OpenSSH server. I've tried different node versions, I've tried forcing the SFTP connection to not assume OpenSSH so that it uses the different, lower max read size, and I've tried different file sizes. Every time the SHA1 hash matches on both ends.

What is the ident of this server? Is it not OpenSSH, despite advertising OpenSSH SFTP extensions?

mscdex commented 2 years ago

I'd also be curious to know if the OpenSSH SFTP client exhibits the same corruption issue and if not, what the OpenSSH SFTP client debug output looks like for that transfer.

mscdex commented 2 years ago

Can you try the master branch of ssh2 directly? There are some changes there that haven't been released yet and I'm wondering if the issue has been fixed or not.

Michael-Gibbons commented 1 year ago

@mscdex I was able to replicate this issue on the latest version. 1gb file was consistently corrupted. I agree with @amitkirdatt that it is a chunk size negotiation issue but I am unsure how else to help.

mscdex commented 1 year ago

@Michael-Gibbons well, without any way for me to (reliably) reproduce the problem on my end, it's going to have to be up to others to debug the issue unfortunately.

tedeh commented 1 year ago

If it's any help I can confirm the problem does not exist on ssh2 v0.8.9 but it does exist on v1.13.0. Against the same remote server, of course.

Also, here is the remote server software I think?

debug1: Remote protocol version 2.0, remote software version GoAnywhere7.1.2
robbe482 commented 1 year ago

Had the exact same problem with both libraries. Thanks for sharing your solution!

lpavliuk commented 1 year ago

Hey guys, I had the same issue. However, having read the documentation again, I found that .fastGet() uses a chunk reading, which does not work for a huge file because a chunk concatenation can corrupt the file, as we see. In this case, .createReadStream() should be used. It works well for files of 1Mb+ size.

Here is the implementation:

const _sftp_get_file = (sftp_conn, {remote_file_path, local_file_path}) => new Promise(resolve => {
    let sftp_read_stream = sftp_conn.createReadStream(remote_file_path, {autoClose: true}),
        write_stream = fs.createWriteStream(local_file_path),
        downloaded_successfully = false,
        _resolve = ({result = false, is_external_error = false, message = ''}) => resolve({
            result,
            is_external_error,
            message,
        })
        ;

    console.log(`Download the file ${remote_file_path} => ${local_file_path} from SFTP...`);

    sftp_read_stream.on('error', (err) => {
        log_error(`Failed ReadStream: ${err.message}`, {err, remote_file_path});
        _resolve({
            result: false,
            is_external_error: true,
            message: `Failed to download a file from SFTP: ${err.message}`
        });
    });

    write_stream.on('error', (err) => {
        log_error(`Failed WriteStream: ${err.message}`, {err, local_file_path});
        _resolve({ result: false });
    });

    write_stream.on('finish', () => {
        console.log(`[!] File has been saved successfully`);
        downloaded_successfully = true;
    });

    sftp_read_stream.on('close', () => {
        downloaded_successfully
            ? _resolve({ result: true })
            : 0 // means an Error has occurred
    });

    sftp_read_stream.pipe(write_stream);
});