lambci / yumda

Yum for AWS Lambda
MIT License
284 stars 21 forks source link

Amazon Linux Extras: ClamAV #3

Closed StevenACoffman closed 5 years ago

StevenACoffman commented 5 years ago

Use Case:

I would like an S3 event trigger to scan new files in a bucket for viruses using ClamAV.

As an example see: slamscan or updated version slamscan

On a typical EC2 instance, I run:
sudo amazon-linux-extras install epel
sudo yum install clamav freshclam clamav-update

I would like these to be available in a yum repository compiled against Lambda environment.

mhart commented 5 years ago

Awesome, thanks for the suggestion. I'll look into it!

mhart commented 5 years ago

There's a decision that needs to be made around where the DB files should be stored.

By default yumda specifies the /var directory to be /tmp/var – because typically files under /var need to be writeable and /tmp is the only writable place in Lambda.

That means that, by default, the dbdir that clamav will be compiled with will be /tmp/var/lib/clamav and the various *.cvd files and mirrors.dat will need to live in there. So they can't be included in /opt and zipped up with the layer. They can, of course, be downloaded into /tmp/var/..., say from S3, during a Lambda execution – which I believe is what https://github.com/randytarampi/slamscan does as well.

Any thoughts on this? Will /tmp/var/lib/clamav be fine – or do you think it should be under, say, /opt/share (in which case, it won't be writable by Lambda, and it also may run into size limitations)

mhart commented 5 years ago

Ok, I've published clamav to the yumda repo. You can install it using:

docker run --rm -v "$PWD/layer":/lambda/opt lambci/yumda:2 yum install -y clamav

(assumes a ./layer directory that you'll zip up and deploy as a Lambda layer)

The DB dir has been compiled as /tmp/var/lib/clamav as explained above, so you'll need to make sure the *.cvd files are in there.

So, for example, if you yum install -y clamav-data using the yumda image, then that will create /lambda/tmp/var/lib/clamav/*.cvd.

I haven't tested it in any way, so please give it a shot and let me know how it goes and if there are any improvements that could be made

mhart commented 5 years ago

Oh, I made it for Amazon Linux 2 (ie, for the nodejs10.x runtime).

mhart commented 5 years ago

So as I said, this is now available. Going to close this issue

StevenACoffman commented 5 years ago

Thanks! This worked in my test!

mhart commented 5 years ago

Oh wow, nice 👍

ghost commented 4 years ago

Hi,

I'm trying to run freshclam inside a node12.x lambda and keep getting the following error:

error while loading shared libraries: /var/task/lib/libclamav.so.9: invalid ELF header

I got the all the /bin and /lib files running this command:

lambci/yumda:2 yum install -y clamav

And then manually zipped them into my own lambda package.

Any idea about what could be wrong?

mhart commented 4 years ago

It looks like you've deployed it as a package, not a layer (so it's sitting under /var/task instead of /opt). The instructions in the README show how to package up yumda libraries into layers.

ghost commented 4 years ago

So, after posting my message I tried with the SAM approach (using layers) and still got the same error. Then I saw the note about the packaging bug with SAM and tried removing the symlinks and renaming the lib files. That worked.

I tried also with the zip approach, but could not get it working. Since I'm on windows, I assume it's something I'm doing wrong.

Anyway, many thanks for your help, I know where is the issue now 👍

Hyyy6 commented 4 years ago

Hi @mhart ,

I'm experiencing the same issue as @marcmarcet with /opt/layer/bin/freshclam: error while loading shared libraries: /opt/layer/lib/libclamav.so.9: invalid ELF header on node12.x lambda using layer created with docker run --rm -v "$PWD/layer":/lambda/opt lambci/yumda:2 yum install -y clamav and deployed manually.

I had to explicitly set up LD_LIBRARY_PATH environment variable for freshclam to look for shared libraries in the right place. Had it deployed as one package before and got the same error. I also don't get it how using a layer makes any difference if all the files just end up in the same environment just in the different directory.

Was also thinking of compiling ClamAV from source on an EC2 instance to make sure it's the right environment (as what supposedly invalid ELF header error means), but at this point I doubt it would make any difference.

Would kindly appreciate any thoughts on my issue.

mhart commented 4 years ago

Why is it looking for it in /opt/layer?

mhart commented 4 years ago

Ah, you need to go into your layer directory before you zip it up – it should unpack into /opt on Lambda – not /opt/layer. And you won't need to adjust your LD_LIBRARY_PATH because /opt/lib is already on that path

mhart commented 4 years ago

So something like this (untested):

docker run --rm -v "$PWD/layer":/lambda/opt lambci/yumda:2 yum install -y clamav
cd layer
zip -yr ../layer.zip .
cd ..
# deploy layer.zip as the layer
Hyyy6 commented 4 years ago

No luck, unfortunately. Same issue, different directories, so the root cause of the issue is different /opt/bin/freshclam: error while loading shared libraries: /opt/lib/libclamav.so.9: invalid ELF header

Thanks for the right suggestion with the structure of layer .zip archive though!

mhart commented 4 years ago

What environment are you in? Can you show the exact steps you're using to create and bundle your layer?

Hyyy6 commented 4 years ago

I'm using WSL with Ubuntu 18.04.

  1. docker run --rm -v "$PWD/layer":/lambda/opt lambci/yumda:2 yum install -y clamav
  2. cd layer \ zip -yr ../layer.zip . \ cd ..

Then I deploy the archive as a layer, setting up runtime to nodejs12.x (same as my lambda function), and add this layer to my function.

Doing docker run command from Windows PS doesn't make any difference apart from the need to deal with reference error due to Windows using backslash

mhart commented 4 years ago

Hmmm, the Windows factor makes me suspicious a little. Are the symlinks working correctly?

mhart commented 4 years ago

Just to eliminate Windows as a factor, try this:

docker run --rm -v "$PWD":/tmp lambci/yumda:2 sh -c \
  'yum install -y clamav && cd /lambda/opt && zip -yr /tmp/layer.zip .'

And then use layer.zip that was created in the current directory.

mhart commented 4 years ago

(maybe add a rm -f /tmp/layer.zip in there before you create the zipfile just to be sure – zip by default will update an existing file instead of overwriting it)

Hyyy6 commented 4 years ago

@mhart It worked, thank you so much! I was sure that yum commands ran on behalf of docker container. And apparently WSL doesn't help in this case either. Simply didn't think the Windows could be a factor, as I thought Docker abstracts from whatever environment it's being run on.

mhart commented 4 years ago

The yum command was running in your docker container – the problem is (I'm guessing) that when it's writing everything to the mounted volume, things like symlinks (and other Unix-like things) aren't being preserved on your Windows machine. Either that, or the zip command doesn't work the same under WSL as it does on Linux.

If you do the zipping from within the docker container (which is the command I gave you), then it makes sure that symlinks etc are preserved.

WSL has been the cause of so many GitHub issues on my repos, sigh.

cfalzone commented 2 years ago

I know this issue is closed, and I can open a new issue if necessary, but I was wondering how people are using this? I am running into an issue getting clamav scanning going. I packed it up as a layer zip as described above, but when I try to run it in a lambda function I am getting this error:

2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e LibClamAV Error: cl_load(): No such file or directory: /tmp/var/lib/clamav
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e ERROR: Can't get file status
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e ----------- SCAN SUMMARY -----------
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Known viruses: 0
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Engine version: 0.102.4
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Scanned directories: 0
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Scanned files: 0
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Infected files: 0
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Data scanned: 0.00 MB
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Data read: 0.00 MB (ratio 0.00:1)
2021-12-08T14:43:46.102000+00:00 2021/12/08/[$LATEST]2448eaddb64946e3b7f56ccb6403383e Time: 0.036 sec (0 m 0 s)

My build script looks like this ...

#!/bin/bash

rm -rf ./layers
mkdir layers

## See: https://github.com/lambci/yumda/issues/3

docker run --rm -v "$PWD/layers":/tmp lambci/yumda:2 sh -c \
  'yum install -y clamav && cd /lambda/opt && zip -yr /tmp/clamav-layer.zip .'

My serverless.yml deploys the layer like:

layers:
  clamav:
    package:
      artifact: layers/clamav-layer.zip

And in my functions handler (Typescript) I am calling the layer like this ...

  try {
    const scan = execSync(`clamscan /tmp/${fileName}`, {
      cwd: '/tmp',
      encoding: 'utf8',
      stdio: 'inherit',
    });
    console.log({ message: 'File scan results', scan, fileId, key });
    return true;
  } catch (err) {
    console.log({ message: 'Unable to scan file or file infected.', err, fileId, key });
    return false;
  }
cfalzone commented 2 years ago

Wanted to mention that I got this working. The error I was getting was because I never ran freshclam and downloaded the virus definitions. It was looking for the folder for them which never existed. I ended up making a separate lambda function for this that runs every 3 hours.