theophilusx / ssh2-sftp-client

a client for SSH2 SFTP
Apache License 2.0
797 stars 195 forks source link

Slow download speed #503

Closed JPVRS closed 6 months ago

JPVRS commented 8 months ago

I'm running into this issue as I try to script the downloading of a roughly 1gb file from a remote SFTP host. The file downloads in roughly 1 minute using WinSCP. The script is successfully connecting and starting the download, but the download speed is something like 10mb per minute. I'm sure I'm doing something wrong, but not sure what. I've tried playing around with the highWaterMark and concurrency settings (and also not using them at all) and the result doesn't change. I'm currently running this on a Windows 10 machine.

Update: I've resolved this by switching to using SCP for the download. It's possible I was just choosing the wrong tool for the job.

require("dotenv").config();
const SftpClient = require("ssh2-sftp-client");
const AdmZip = require("adm-zip");
const os = require("os");
const path = require("path");
const fs = require("fs");

// Constants for remote paths
const REMOTE_PATH = "./remote/path";

// Directory where the zip archives will be downloaded and extracted
const downloadDirectory = path.join(
  os.homedir(),
  "local",
  "path",
);
const extractedFilesDirectory = path.join(downloadDirectory, "extracted-files");

// Ensure the download and extraction directories exist
const ensureDirectoriesExist = () => {
  if (!fs.existsSync(downloadDirectory)) {
    fs.mkdirSync(downloadDirectory, { recursive: true });
    console.log(`Created download directory: ${downloadDirectory}`);
  }

  if (!fs.existsSync(extractedFilesDirectory)) {
    fs.mkdirSync(extractedFilesDirectory, { recursive: true });
    console.log(`Created extraction directory: ${extractedFilesDirectory}`);
  }
};

// Connect to the SFTP server and list zip archives
const listArchives = async () => {
  const client = new SftpClient();
  const host = process.env.FTP_HOST;
  const port = process.env.FTP_PORT;
  const username = process.env.FTP_USER_NAME;
  const password = process.env.FTP_PASSWORD;

  try {
    console.log(`Connecting to SFTP server at ${host}:${port}...`);
    await client.connect({ host, port, username, password });
    console.log("Connected to SFTP server.");
    console.log(`Navigating to remote folder: ${REMOTE_PATH}`);

    const fileList = await client.list(REMOTE_PATH);
    const zipFiles = fileList
      .filter((file) => file.type === "-" && file.name.endsWith(".zip"))
      .sort((a, b) => new Date(a.modifyTime) - new Date(b.modifyTime));

    console.log("Zip archives found (oldest to newest):");
    zipFiles.forEach((file) => console.log(file.name));

    return { client, zipFiles };
  } catch (err) {
    console.error("Failed to connect to SFTP server or list archives:", err);
    client.end();
    process.exit(1);
  }
};

// Prompt user to select an archive to download
const selectArchiveToDownload = async (zipFiles) => {
  const { default: inquirer } = await import("inquirer");
  const choices = zipFiles.map((file) => file.name);
  const questions = [
    {
      type: "list",
      name: "selectedArchive",
      message: "Select a zip archive to download:",
      choices: choices,
    },
  ];

  const answers = await inquirer.prompt(questions);
  return answers.selectedArchive;
};

// Download and extract the selected archive
const downloadAndExtractArchive = async (client, selectedArchive) => {
  const localPath = path.join(downloadDirectory, selectedArchive);
  const remoteFilePath = path.join(REMOTE_PATH, selectedArchive);

  try {
    console.log(`Downloading ${selectedArchive} to ${localPath}...`);
    await client.fastGet(remoteFilePath, localPath, {
      highWaterMark: 2048576,
      concurrency: 8,
    });

    console.log(`Downloaded ${selectedArchive}.`);

    console.log(`Extracting ${selectedArchive}...`);
    const zip = new AdmZip(localPath);
    zip.extractAllTo(extractedFilesDirectory, true);
    console.log(`Extracted to ${extractedFilesDirectory}.`);
  } catch (err) {
    console.error("Failed to download or extract archive:", err);
    process.exit(1);
  } finally {
    client.end();
  }
};

// Main script function
const main = async () => {
  ensureDirectoriesExist();
  const { client, zipFiles } = await listArchives();

  if (zipFiles.length === 0) {
    console.log("No zip archives to download.");
    client.end();
    return;
  }

  const selectedArchive = await selectArchiveToDownload(zipFiles);
  await downloadAndExtractArchive(client, selectedArchive);
};

main();
theophilusx commented 8 months ago

What version of ssh2-sftp-client? What version of node are you running? What platform are you running on?

Iff there is a problem here, it is almost certainly going to be something related to the ssh2 library rather than ssh2-sftp-client as this library just wraps the ssh2 functionality in a promise API and has no direct influence/impact on the underlying ssh transfer operations. This means you would need to work with the ssh2 maintainer to identify possible problems. The downside is that you will first need to replicate the issue using just ssh2 (which actually wouldn't be that hard, especially as you can use the source for ssh2-sftp-client to get an idea on how to do this, plus there are some example scripts in validation directory of the ssh2-sftp-client repository).

The first thing I would do is confirm that using the get() method is also slow. The fastGet() and fastPut() methods are very dependent on the capabilities and support within the remote sftp server. My experience has been taht when fastGet() doesn't work or doesn't work reliably, your much better off using just plain get(). I've also found that with some combinations of sftp servers and file sizes, get() is better than fastGet(). So, first step is confirm the slow download speed with get().

The next step would be to test against a different sftp server. There are some very poor sftp servers out there and a lot which are not standards compliant. Therefore, important to ensure the issue isn't with the server.

A quick scan of your code doesn't reveal any obvious errors. The one thing I would do is remove the settings for high water mark and concurrency. If you want to try out these settings, you really need to incorporate some low level testing/debugging as geting these settings wrong will have the opposite effect and slow things down or possibly result in data curruption. With fastGet/fastPut, more concurrency or higher water marks don't necessarily mean faster throughput. There are a lot of posts to the ssh2 issues list regarding experiments and various settings people have tried with respect to fastGet/fastPut, so scanning some of those might help., However, be aware that ssh2 v1+ is a complete re-write, so some of the older posts may not be relevant. My suspicion is you will find get() to be faster than fastGet().

Finally and probably so obvious is doesn't need mentioning, but don't expect to get the same performance as you do with winftp (or any other dedicated sftp client). Just the fact your running under node will mean slower performance. I would also use openSSH CLI sftp program as the base line as the ssh2 sftp library uses thes ame standards as openSSH, while programs like winftp may include non-standard proprietary extensions available in some sftp servers (especially on MS windows/Azure platforms).

JPVRS commented 8 months ago

Thank you for the response. Next time I'll try simply using get() and the other steps you suggested. Please feel free to close.