kylefarris / clamscan

A robust ClamAV virus scanning library supporting scanning files, directories, and streams with local sockets, local/remote TCP, and local clamscan/clamdscan binaries (with failover).
MIT License
230 stars 68 forks source link

Increasing file size does not work with daemon + Clamscan "INSTREAM: Size limit reached, (requested: 65536, max: 0)" #131

Open philly-vanilly opened 1 month ago

philly-vanilly commented 1 month ago

I am using google-cloud-sdk/slim docker image with this config (note the non default size limits):

RUN apt-get install clamav-daemon -y && \
    npm ci && \
    curl -sSL https://sdk.cloud.google.com/ | bash && \
    echo "StreamMaxLength 40M" >> /etc/clamav/clamd.conf && \
    echo "MaxFileSize 40M" >> /etc/clamav/clamd.conf && \
    echo "MaxScanSize 40M" >> /etc/clamav/clamd.conf && \
    mkdir /unscanned_files

and

  try {
    if (clamscan === null) {
      clamscan = await (new NodeClam().init({
        removeInfected: false,
        quarantineInfected: false,
        scanLog: null,
        debugMode: true,
        fileList: null,
        scanRecursively: true, // If true, deep scan folders recursively
        clamdscan: {
          host: XXX
          port: XXX,
          timeout: 60000,
          localFallback: false,
          path: null,
          multiscan: true,
          bypassTest: false,
          configFile: '/etc/clamav/clamd.conf'
        },
        preference: 'clamdscan'
      }));
    }

But all scan attempts of a 30 MB size file end in the titular error. I have tried multiple NodeJS clients (clamscan, clamdjs) and it is the same error, so I believe the problem is with the daemon itself. What is surprising is that the error says "max: 0" but 0 either means that there is no limit at all or it really is 0, but if it was 0, I would not be able to scan files below 25MB. But other ClamScan users have also reported on this: https://github.com/Cisco-Talos/clamav/issues/1210 But I thought a slightly higher file size should be no problem. Others seem to use ClamAV for files with GB size, although perhaps not with NodeJS.

The error seems to be present in other (python) integrations as well: https://github.com/Cisco-Talos/clamav/issues/942

The only hint with upvotes I could find was for the C# client https://stackoverflow.com/questions/39371037/how-change-limit-file-size-of-clamd-service-for-nclam that seems to have the option "MaxStreamSize = 52428800" in addition to the clamd configuration. Is ClamScan perhaps missing a buffering option?

How can I proceed here? Has anyone got clamscan to work with even slightly bigger files?

kylefarris commented 1 month ago

What method are you using to scan the file(s)? isInfected?

philly-vanilly commented 1 month ago

@kylefarris yes, const { isInfected, viruses } = await clamscan.isInfected(fileLocation); It's a single PDF file with some sample big resolution images inside, not an archive.

Can you confirm it is possible to scan slightly larger files with clamscan in general? If so, I would look for other issues within my setup.

kylefarris commented 1 month ago

@philly-vanilly I'm able to scan a 78 MB DMG file just fine on MacOS just fine 🙂 No special configurations. The config file for my daemon has the defaults for the items you listed above:

# Close the connection when the data size limit is exceeded.
# The value should match your MTA's limit for a maximum attachment size.
# Default: 100M 
#StreamMaxLength 25M

# This option sets the maximum amount of data to be scanned for each input
# file. Archives and other containers are recursively extracted and scanned
# up to this value.
# Value of 0 disables the limit
# Note: disabling this limit or setting it too high may result in severe damage
# to the system.
# Default: 400M
#MaxScanSize 1000M

# Files larger than this limit won't be scanned. Affects the input file itself
# as well as files contained inside it (when the input file is an archive, a
# document or some other kind of container).
# Value of 0 disables the limit.
# Note: disabling this limit or setting it too high may result in severe damage
# to the system.
# Technical design limitations prevent ClamAV from scanning files greater than
# 2 GB at this time.
# Default: 100M
#MaxFileSize 400M

I scanned the large file via socket, host:port, and local fallback (CLI) and worked fine in all cases.

philly-vanilly commented 1 month ago

@kylefarris I have reduced my setup to a minimal POC but I still cannot scan big files unlike small ones. Could you please share your configuration? Or perhaps you see anything within mine that looks weird?

Here is my Dockerfile:

FROM ubuntu:20.04
RUN apt-get update && \
    apt-get install -y clamav clamav-daemon curl && \
    apt-get clean
RUN freshclam
RUN mkdir -p /var/run/clamav && \
    chown clamav:clamav /var/run/clamav
RUN echo "TCPSocket 3310" >> /etc/clamav/clamd.conf && \
    echo "TCPAddr 0.0.0.0" >> /etc/clamav/clamd.conf
COPY start-clamav.sh /usr/local/bin/start-clamav.sh
RUN chmod +x /usr/local/bin/start-clamav.sh
EXPOSE 3310
CMD ["/usr/local/bin/start-clamav.sh"]

with the sh script being

#!/bin/bash
service clamav-daemon start
tail -f /var/log/clamav/clamav.log

and I execute the scan with this function:

async function scanLocalFile(fileLocation) {
    if (clamscan === null) {
      clamscan = await (new NodeClam().init({
        removeInfected: false,
        quarantineInfected: false,
        scanLog: null,
        debugMode: true,
        fileList: null,
        scanRecursively: true, // If true, deep scan folders recursively
        clamdscan: {
          host: '0.0.0.0',
          port: 3310,
          timeout: 60000,
          localFallback: false,
          path: null,
          multiscan: true,
          bypassTest: false,
          configFile: '/etc/clamav/clamd.conf'
        },
        preference: 'clamdscan'
      }));
    }
    return await clamscan.isInfected(fileLocation);
}

which for now is being called in jest:

const {scanLocalFile} = require("../src/scan");
it('test output', async () => {
    const res = await scanLocalFile('/Users/.../big-file.pdf');
    console.log(res);
});

I get a proper response for a small file, but for a big one, process terminates with

Error: Could not scan file via TCP or locally!

    at .../node_modules/clamscan/index.js:1134:37

  console.log
    node-clam: Raw Response:  INSTREAM size limit exceeded. ERROR // for a small file, this is stream: OK

      at Socket.<anonymous> (node_modules/clamscan/index.js:2328:37)

  console.log
    node-clam: Error Response:  INSTREAM size limit exceeded. 

      at NodeClam._processResult (node_modules/clamscan/index.js:805:54)

  console.log
    node-clam: File may be INFECTED!

      at NodeClam._processResult (node_modules/clamscan/index.js:806:54)
kylefarris commented 1 month ago

Are you able to scan the large file locally using the command line? You can use telnet to test if the file can be scanned that way.

Assuming you have telnet installed, type in:

telnet 0.0.0.0 3310

Then, assuming it connects, type the following in and press Enter:

nSCAN /Users/.../big-file.pdf

It should take a second and then respond with something like this and immediately close the socket:

 /Users/.../big-file.pdf: OK

Mine throws an inconsequential Error processing command. ERROR before the OK line.

If it gives you the INSTREAM size limit issue there, you'll probably have more luck talking with ClamAV experts on Serverfault or something similar as the issue wouldn't be anything to do with this Node package.