kylefarris / clamscan

A robust ClamAV virus scanning library supporting scanning files, directories, and streams with local sockets, local/remote TCP, and local clamscan/clamdscan binaries (with failover).
MIT License
236 stars 69 forks source link

Scanning files with passthrough times out #69

Closed hellovietduc closed 3 years ago

hellovietduc commented 3 years ago

Describe the bug

I'm implementing a function that receives a file URL, creates a readable stream with axios, and then pipes that stream to clamscan with the passthrough method. However, I always receive COMMAND READ TIMED OUT in the debug logs and is_infected is null. It usually takes around 3 minutes to respond.

Code to reproduce

This is the simplified code.

const nodeClam = new NodeClam()
const clam = await nodeClam.init(clamScanOptions)

const scanRemoteFile = async (url: string): Promise<ScanResult> => {
  let fileStream: ReadStream
  try {
    // Get a readable stream from this URL
    const res = await axios({
      method: 'GET',
      url,
      responseType: 'stream',
    })
    fileStream = res.data
  } catch (e) {
    console.error(e)
  }

  // Send output of this file stream to ClamAV
  const av = clam.passthrough()
  fileStream.pipe(av)

  return new Promise((resolve, reject) => {
    av.on('scan-complete', (result) => {
      resolve(result)
    })
    // other listeners
  })
}

Debug logs

node-clam: DEBUG MODE ON
node-clam: Could not verify the clamdscan binary. [Error: ENOENT: no such file or directory, access '/usr/bin/clamdscan'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'access',
  path: '/usr/bin/clamdscan'
}
node-clam: Could not verify the clamscan binary. [Error: ENOENT: no such file or directory, access '/usr/bin/clamscan'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'access',
  path: '/usr/bin/clamscan'
}
node-clam: Initially testing socket/tcp connection to clamscan server.
node-clam: using remote server: xxx.xxx.xxx.xxx:3310
node-clam: Established connection to clamscan server!
node-clam: PONG!
node-clam: Established connection to clamscan server!
node-clam: Socket/Host connection closed.
node-clam: using remote server: xxx.xxx.xxx.xxx:3310
node-clam: ClamAV Socket Initialized...
node-clam: Doing initial transform!
node-clam: Got result! COMMAND READ TIMED OUT

node-clam: Error Response:  COMMAND READ TIMED OUT
node-clam: File may be INFECTED!
node-clam: Processed Result:  {
  is_infected: null,
  viruses: [],
  file: null,
  resultString: 'COMMAND READ TIMED OUT'
} COMMAND READ TIMED OUT

node-clam: ClamAV socket has received the last chunk!
node-clam: Error Response:  COMMAND READ TIMED OUT
node-clam: File may be INFECTED!
node-clam: Result of scan: {
  is_infected: null,
  viruses: [],
  file: null,
  resultString: 'COMMAND READ TIMED OUT'
}
node-clam: It took false seconds to scan the file(s).

Expected behavior

Files can be scanned normally.

Other information

hellovietduc commented 3 years ago

I have a theory and a suggestion:

kylefarris commented 3 years ago

Hey @hellovietduc! That's interesting... I've never seen this issue before. I've actually never even seen that particular message/result string at all before.

What's even more interesting is that this module does handle timeouts from the Socket connection but, in your case, it doesn't seem that the connection is actually getting closed from a timeout otherwise you'd see: Connection to host/socket has timed out somewhere in your logs. It's actually responding with data but ClamAV is responding to the client telling it that there's a timeout. So, we'd have to handle that in the _process_result method of the module. Right now it's just assumed that if it isn't one of the specified responses, it's an error. And, that's not really wrong, per se, but since we have a way to watch for timeouts, I suppose we could emit a timeout event when that message is received from the remote ClamAV server.

Try adding this chunk of code after line 619:

if (result === 'COMMAND READ TIMED OUT') {
    this.emit('timeout', new Error('Scanning on remote host/socket has timed out!'));
    if (this.settings.debug_mode) {
        if (this.settings.debug_mode) console.log(`${this.debug_label}: Scanning file has timed out. Message:`, result);
        if (this.settings.debug_mode) console.log(`${this.debug_label}: File may be INFECTED!`);
    }
    return new NodeClamError({error}, `A timeout occurred while scanning the piped-through stream: ${result}`);
}

If that helps detect the timeout, please report back and/or put in a PR with that block of code included.

hellovietduc commented 3 years ago

@kylefarris Thanks for this. But what do you think is the cause that leads to this timeout? Is it how I pass a stream directly from axios to passthrough or is it the docker-clamav that I'm using?

kylefarris commented 3 years ago

I'm not sure what's causing the timeout--haven't seen this before.

I'm not going to be able to accept your PR since it's not going to work the way you think. I have a new branch and draft PR I'm working on that will properly emit a timeout event when expected. but Travis-CI just isn't cooperating (the tests run fine locally). I've tried to switch to GitHub Actions but I'm having different issues there. Very difficult issues to solve since they have to do with ClamAV configuration the CI VMs and there's really no easy way to debug it.

hellovietduc commented 3 years ago

Thank you.

kylefarris commented 3 years ago

@hellovietduc Also, for what it's worth, you don't need to use the passthrough method if you are not sending the requested file somewhere else (Examples: writing to disk, sending to S3, etc...). If you want to just scan a stream, use the scan_stream method as it's much simpler to implement.