singularityhub / sregistry

server for storage and management of singularity images
https://singularityhub.github.io/sregistry
Mozilla Public License 2.0
103 stars 42 forks source link

Proposal - Container security scanning with Clair #14

Open dtrudg opened 7 years ago

dtrudg commented 7 years ago

Clair is the CoreOS project for security static analysis of containers, scanning them for security issues (from databases of known CVEs). I'd like to propose adding support to sregistry for scanning containers using Clair.

Though Clair is centered around docker or appc images, it has been used to scan openvz templates, which are .tar archives - see FastVPSEestiOu/check_openvz_mirror_with_clair. I'm pretty sure something similar could be done for singularity images.

This is something I'm planning to work on, and thought I'd add a ticket here in case it's of interest to others / there are any thoughts? I'm thinking I will be working to:

Would welcome any input on if this is of interest for sregistry, or more generally.

vsoch commented 7 years ago

I've tried out Clair (I made a Singularity version that was going to be for a check) and I stopped because it was requiring a server. This would definitely be great, but a few thoughts:

My general thinking is that I like the idea, but I think it should be a separate module from the registry itself. Eg, it really should be running on its own server, and then have different (independent) commands run:

sregistry check container.img clair
sregistry push container.img --name library/collection:tag

The command line client (sregistry) would have two different servers / applications it is pinging to get that functionality. This also makes sense so we don't develop a hairball of a mess of different applications under one roof. The registry should accept and serve images, and be optimized for that. The security bits (+1 for important!) should be done separate to that.

Your approach sounds good! I can definitely add the endpoint for checks to the sregistry client. To be clear on your last point, I don't think the right place is with the singularity registry application itself, but for the client (sregistry).

I anticipate you will run into challenges that clair wants .tar.gz layer things, and we have entire images! Let's use this issue as a board to discuss things as they come up :)

remyd1 commented 7 years ago

Refers to: https://github.com/singularityhub/singularityhub.github.io/issues/33

Or, as a local tool: https://github.com/singularityware/singularity/issues/818

dtrudg commented 7 years ago

Container scanning with Clair turns out to be nice and straightforward, tested against Clair 2.0.0. A singularity image, exported to tar.gz, masquerades as a single layer docker image for Clair's scanning purposes. I was able to scan an oldish, fat ubuntu singularity image straightforwardly:

Gist - Scan a Singularity image with Clair

I'll go ahead and create a python tool to do this in a bit more friendly manner. I do like the idea of having something like an sregistry check command where a user can get this done.

My personal bias is that container scanning should be an optional feature of the registry. Having scan results collected and stored separate from the container storage location (registry) is a bit awkward. Also, it would be nice for scheduled scans to be able to show people what issues their containers are picking up as they age.

vsoch commented 7 years ago

+1 - if you want to write this into python (or I'd be happy to actually, it doesn't look very long!) I'll add to sregistry! The harder bit is creating and running the server - is there a condensed document we could use to explain how to run a clair server alongside a registry?

dtrudg commented 7 years ago

@vsoch - yeah, getting a scan result is surprisingly easy. There is a bunch of filtering that is probably required though. Clair spits out a lot of CVEs for an up-to-date Ubuntu. They are genuine security vulns, but not of interest to most as the are low/minimal severity. A filter on severity for output, tabulation / html report option would be good. That's something I'd be happy to work up given a bit of time.

How to run a Clair server depends on what you want to do with it. It's feasible people would want something just for sregistry, but maybe they want to use it to check docker images etc. too. The easiest thing is to use the 2 docker containers from arminc with instructions from https://github.com/arminc/clair-local-scan as I did in the Gist - since they have the CVE info already loaded into the DB. Starting a Clair from scratch it takes a long time to pull that info. Possible we could maintain a simple docker-compose.yml somewhere that gives you a Clair server? Production deployments would be expected to be more complex with use of SSL client certs to ensure you aren't sending your image (with maybe confidential stuff) anywhere but to the trusted Clair server you expect.

vsoch commented 7 years ago

@dctrud that would make sense! We can have a plugins folder for the registry with clair as a submodule, or just include the file natively, depending on your preference.

remyd1 commented 7 years ago

That is awesome !!!!

If you need some help I created a simple PHP/JS code that parses Json datas to display it as an html tab which can be sort easily. But I think @vsoch prefers the python version.

Another step forward would be to pull and then convert automatically the docker Clair images as singularity images (cron ?) already included in the sregistry. We just need to wait the full integration of singularity services by @bauerm97.

Thanks, Rémy

dtrudg commented 7 years ago

I'm beginning my step 1 (a stand alone tool that I need) at: https://github.com/dctrud/clair-singularity. Hope to have something usable (with some docs and a docker compose file that'll get clair up easily) by the end of the weekend. After that can chat more about integrating the functionality to the sregistry CLI etc?

@remyd1 - thanks for the code offer, but I'm keeping to Python also.

vsoch commented 7 years ago

+1

dtrudg commented 7 years ago

@vsoch @remyd1 - The initial version at https://github.com/dctrud/clair-singularity is now a working thing with some docs, in case you'd like to try it out. I had a very uninterrupted morning, so got much further than I thought I would.

Now I am starting to think about integration with sregistry there are a few things:

What does @remyd1 need in terms of interacting with Clair/sregistry, and how would the reports be used? It'd be good to know how others might use this, so I don't propose something far too specific to what I have in mind.

vsoch commented 7 years ago

Oh hmm. I think we would arguably want scanning / assurance of security before uploading to sregistry. It It's important that the registry is optimized for just getting and providing images, and doesn't need to do things like wait on scans. I completely agree that "another web server" is not idea, and in fact this is why I stopped pursuing using Clair for singularity checks.

@dctrud - I think if we want to do this right we need some kind of middle ground. I'm guessing Clair prefers being a webby - based thing so it can update itself? Ideally, we could have a (local) client to run clair, with some output to the user every so often to run an update.

dtrudg commented 7 years ago

The trouble with scanning only before upload is that a scan result is a point-in-time thing. You don't just care about whether a container is secure only at the point it is pushed - vulnerabilities come up continuously, and affect an increasing proportion of your existing containers in the registry.

Thinking more about this, I tend to see scans as principally useful when people pull an image. Can imagine scenarios for sensitive data workflows where might e.g. need to blacklist and disallow a pull of a container that's had a critical vuln outstanding for 30 days - or other such things. That type of stuff needs scans of registry contents either internal, or external, to the registry. If we have a user who has generated a bunch of containers, an easy central overview of their current vulnerability status could be a useful thing.

@vsoch - Clair is a persistent database backed web app, with interaction only via the JSON API. The update is a scheduled task that its worker process carries out auto-magically. I don't think it's amenable to being run locally by a client - unless the client can spawn docker/singularity services on demand, which probably isn't realistic.

The embedding into a registry approach is the preferred Clair integration, hence the web service design. A key feature is once a layer is added to Clair then there can be notifications sent when new CVEs come up.

https://github.com/coreos/clair/blob/master/Documentation/running-clair.md

Clair can be integrated directly into a container registry such that the registry is responsible for interacting with Clair on behalf of the user. This type of setup avoids the manual scanning of images and creates a sensible location to which Clair's vulnerability notifications can be propagated. The registry can also be used for authorization to avoid sharing vulnerability information about images to which one might not have access.

And the alternative CI scenario they have is designed around post-push scans:

Clair can be integrated into a CI/CD pipeline such that when a container image is produced, the step after pushing the image to a registry is to compose a request for Clair to scan that particular image. This type of integration is more flexible, but relies on additional components to be setup in order to secure.

No matter what ends up in sregistry I'll definitely be implementing either an external tool to do scheduled scanning of things in the registry - or some kind of fork/plugin to add into sregistry itself. That's something we see a need for here, but understand completely your wish to keep sregistry simple and focused on image storage/serving.

vsoch commented 7 years ago

The trouble with scanning only before upload is that a scan result is a point-in-time thing. You don't just care about whether a container is secure only at the point it is pushed - vulnerabilities come up continuously, and affect an increasing proportion of your existing containers in the registry.

This makes sense, but then I would still argue it would still be reasonable to scan an image on push, and again on pull. The image sitting statically as a file won't be much issue, and it doesn't seem efficient with resources to be constantly scanning it. A huge burden for the registry to be constantly scanning upwards of thousands of containers, plus being available for pushing and pulling from gosh knows how many sources.

Thinking more about this, I tend to see scans as principally useful when people pull an image. Can imagine scenarios for sensitive data workflows where might e.g. need to blacklist and disallow a pull of a container that's had a critical vuln outstanding for 30 days - or other such things. That type of stuff needs scans of registry contents either internal, or external, to the registry. If we have a user who has generated a bunch of containers, an easy central overview of their current vulnerability status could be a useful thing.

This would require still constant scanning of images, which isn't reasonable for a single application. It's better to do one simple function really well than try to do everything and be kind of slow, etc.

@vsoch - Clair is a persistent database backed web app, with interaction only via the JSON API. The update is a scheduled task that its worker process carries out auto-magically. I don't think it's amenable to being run locally by a client - unless the client can spawn docker/singularity services on demand, which probably isn't realistic.

They had a command line client which is still used, but not hugely worked on (see my original issue I posted to them). Could it be brought back to life for Singularity? Another idea - what if we just had clair pointed at the same image base (a folder with subdirectories of images) to fire off scans at some point? And then we would have clair send the registry a ping to flag a container if something bad came up? The main complication here, then, would be needing to access the same image base from both applications, which probably means using the same server (poorer performance) or a more advanced setup (that most don't have).

The embedding into a registry approach is the preferred Clair integration, hence the web service design. A key feature is once a layer is added to Clair then there can be notifications sent when new CVEs come up.

We couldn't use some kind of other triggers?

https://github.com/coreos/clair/blob/master/Documentation/running-clair.md

Clair can be integrated directly into a container registry such that the registry is responsible for interacting with Clair on behalf of the user. This type of setup avoids the manual scanning of images and creates a sensible location to which Clair's vulnerability notifications can be propagated. The registry can also be used for authorization to avoid sharing vulnerability information about images to which one might not have access.

Having managed many web applications that work with different APis, services, I just don't see the average (smaller) institution being able to deploy an infrastructure to handle doing both. Ideally yes, everyone would have kubernetes clusters with a ton of images, but realistically the registry would be deployed on a single left over server, and probably already burdened with just pushing and pulling.

And the alternative CI scenario they have is designed around post-push scans:

I think this is what would be desired, with a signal that is based on the image being saved / changed, but not reliant on the registry itself. Then the registry is notified if/when there is an issue.

Clair can be integrated into a CI/CD pipeline such that when a container image is produced, the step after pushing the image to a registry is to compose a request for Clair to scan that particular image. This type of integration is more flexible, but relies on additional components to be setup in order to secure.

The building / continuous integration is a separate thing entirely, more akin to singularity hub. The registry is agnostic to the build process, that is up to the user/admins. Some might use slurm / other cluster technology, some might use a cloud service, some might use their own computer, Github with CI, or a private server.

No matter what ends up in sregistry I'll definitely be implementing either an external tool to do scheduled scanning of things in the registry - or some kind of fork/plugin to add into sregistry itself. That's something we see a need for here, but understand completely your wish to keep sregistry simple and focused on image storage/serving.

I think this is a good plan. Don't think much about sregistry to start - build something solid that you think is good for its purpose. Then we can start to figure out the integration. I think there are many options and best to do that when the time comes :)

dtrudg commented 7 years ago

For reference, here's what you get on quay.io when looking at a container repo:

image

And you can go to detailed scan results, e.g.:

https://quay.io/repository/biocontainers/ariba/image/dd247e688e2d5392a899fd38881a281ea51e8011ae0db73eb9f8ebe8758d73a4?tab=vulnerabilities

vsoch commented 7 years ago

That's cool! The difference between a registry and quay.io is that they probably have many servers to run, host, etc, and don't have to deal with the issue of one server being over-burdened with work.

dtrudg commented 7 years ago

Okay - I'm back thinking about this now. @vsoch - is there any appetite for being able to have something to support e.g. an app called sregistry.plugins.clair.

I.E. would you be open to the idea of some kind of plugin app setup for sregistry, where you can load plugins apps that can inject celery tasks, and inject display elements into the container detail view template?

vsoch commented 7 years ago

Definitely! I'm not sure how we would inject elements into a user's view - I think a test, state, related container, and message to the user would need to be stored in its own model, run via a celery task, and then have a view that shows the user (with permissions to the collection) to see flags that were added.

dtrudg commented 7 years ago

Okay, I'll try and get something together as a point for discussion. Probably not a clair plugin first - but something that does something simple as a proof of concept - e.g. links to containers with same name on docker hub.

vsoch commented 7 years ago

Awesome! Looking forward to it. I'm really excited that we now have this open source, collaborative registry, and as a community, we can talk about and create plugins that are needed! And customize away! It's so great :)

dtrudg commented 7 years ago

Now there is some outline plugin structure I'll try to get something done on this in early November.

What I'm thinking about first is how we would present UI elements from a plugin, or add plugin specific pages to navigation? E.g. if I enable a clair-singularity plugin and want the container listing or detail page to now have a column / field with some content from the plugin (vuln count, link to detailed security report for example).

The obvious way to me would be just to insert some sensible plugin points in the existing templates. If a plugin is enabled and provides a view named e.g. container_summary_field we can call it and insert in the container summary list in collections view/container search etc. We could also have e.g. container_detail_field, plugin_menu_items

Choices then are - make these simple HTML snippets, make it AJAXy. etc. etc.

I personally like simple HTML template based rendering. E.g. the plugin has a container_summary_column view that returns a snippet like:

<a href="/clair-singularity/image/1234"><strong>54</strong> vulnerabilities</a>

.. and then in the template there's something like:

{% foreach snippet in container.plugin_summary_fields %}
  <td>{{ snippet }}</td>
{% endfor %}

@v - do you have thoughts on this kinda stuff?

vsoch commented 7 years ago

heyo! YES definitely! I just am getting off a 12 hour paper working session, so I'll put something together in the next few days. Stay tuned!

vsoch commented 7 years ago

heyo! I haven't forgotten about this - just didn't get to it today. But we merged the plugin start woohoo! I will try to get to this manana.

vsoch commented 7 years ago

Here are some thoughts:

What I'm thinking about first is how we would present UI elements from a plugin, or add plugin specific pages to navigation? E.g. if I enable a clair-singularity plugin and want the container listing or detail page to now have a column / field with some content from the plugin (vuln count, link to detailed security report for example).

I think there are two ways to go about this. Let's discuss the view of a container build to put it in some context. Here it is!

image

Question to Plugin Developer 1: where is the plugin relevant?

First, it occurs to me that the level / location for which a plugin is relevant could vary. We likely will have a few different ones, the most common might be:

The LDAP just added is relevant for users, for example. I think this warrants a need for more detailed groups of plugins (I'll discuss this in detail at the end).

Question to Plugin Developer 2: is it unique or shared?

For a completely unique view, meaning one that will link off of a main page (as with LDAP) and that link will forever belong to the plugin (e.g., nobody would want to add more content to the LDAP login page unless they were editing that plugin directly), the strategy you used to copy the base template into its own view is good I think. We would advise the plugin developer to:

In the case of a unique view, the addition of the plugin to the template where it belongs could be an if statement

{% if 'ldap_auth' in PLUGINS_ENABLED %}
{% include "ldap_auth/login.htmk" %}
{% endif %}

Question to Plugin Developer 3: For shared views, where to add?

Many plugins might have unique pages, but I would advocate for most to try to use the shared approach. This means that I wouldn't want redundant content for slightly modified versions of the same page under two different plugins, but rather each added based on some logic to the main page. The first thing that comes to mind for the containers example above is adding tabs to correspond with plugins. Basically, for the tab headers and then content, something like:

{% for plugin in PLUGINS_CONTAINERS %}
        <li><a data-toggle="pill" href="#{{ plugin }}">{{ plugin }}</a></li>
{% endfor %}

....
<!-- later down the page -->
{% for plugin in PLUGINS_CONTAINERS %}
        <div id="{{ plugin }}" class="tab-pane fade">
            <div class="col-md-12 card">
            {% include plugin"/main.html" %}
            </div>
        </div>
{% endfor %}

I'm not sure if the include is correct above, but you get the idea! The plugins that have some "main" view to add as a tab to the container's main view would only need to register under PLUGINS_CONTAINERS, and then create a code snippet template called {{ plugin }}/main.html}}. That seems really easy! They are free to branch off of that as they like. The list of plugins could even be more of a dictionary, if we need additional logic:

{% for plugin in PLUGINS_ENABLED %}
        {% if plugin.container %}
        <li><a data-toggle="pill" href="#{{ plugin }}">{{ plugin }}</a></li>
        {% endif %}
{% endfor %}

The above assumes the following:

CLAIR_PLUGIN =  {   name: "clair_plugin"
                                   "collections": False,
                                   "containers": True,
                                   "users": False
                                }

PLUGINS_CONTAINERS = ( CLAIR_PLUGIN, )
PLUGINS_ENABLED = ('ldap_auth', 'clair_plugin')

then later in the template:

{% for plugin in PLUGINS_CONTAINERS %}
        {% if plugin.name in PLUGINS_ENABLED %}
        <li><a data-toggle="pill" href="#{{ plugin.name }}">{{ plugin.name }}</a></li>
        {% endif %}
{% endfor %}

I was first thinking of deriving PLUGINS_ENABLED from the various specific plugins, but it might be reasonable to just pre-define them so enabling means the user just adds the name, and if they want to take it off, they don't lose the settings.

This strategy (I think) would introduce a standard way (and clear instructions) to add content / menu item to link to a plugin of interest.

The obvious way to me would be just to insert some sensible plugin points in the existing templates. If a plugin is enabled and provides a view named e.g. container_summary_field we can call it and insert in the container summary list in collections view/container search etc. We could also have e.g. container_detail_field, plugin_menu_items

This sounds similar to what I was describing! Do you think we would want that level of detail? Should we perhaps start simple, and then add detail when it's warranted? (Eg, instead of container_summary_field rendered in a tiny specific spot, just create an entire general tab for the plugin.

Choices then are - make these simple HTML snippets, make it AJAXy. etc. etc.

What do you mean AJAXy?

I personally like simple HTML template based rendering. E.g. the plugin has a container_summary_column view that returns a snippet like: 54 vulnerabilities .. and then in the template there's something like: {% foreach snippet in container.plugin_summary_fields %} {{ snippet }} {% endfor %}

Is one of the ideas that I proposed above along these lines? Basically, define groups and logic in the settings, and then provide the developer with the "entrypoints" to the application where he/she can add views / urls to know the plugin will create a tab, etc.

Summary

So in summary, the steps to add a plugin we can generally say are:

  1. decide the plugin type, one of collection, container, or user, (or other that needs to be added)
  2. Add it to the config.py, likely some dictionary with options to specify where it wants to render, that is then added to PLUGINS_ENABLED automatically
  3. For integration into views (what I was calling shared), based on the options that are enabled, create the required views and urls. For unique views, add some custom logic into current templates. And note - we should be very critical of adding a shared view, because it is more tangled up with the main codebase. We should encourage the developer to try for the other kind. a. advise to check:
    • advise to use base templates and style
    • check urls for conflict or blocking

1. Plugin Types

PLUGINS_CONTAINERS = ('clair_security', 'container_asciinema', )
PLUGINS_COLLECTIONS = ('osf_integration', 'collection markdown', )
PLUGINS_USERS = ('ldap_auth')
PLUGINS_ENABLED = (,)
PLUGINS_ENABLED = (PLUGINS_ENABLED + 
                                       PLUGINS_CONTAINERS + 
                                       PLUGINS_COLLECTIONS + 
                                       PLUGINS_USERS)

With this method, we would parse through the lists on each relevant page. Plugins that shouldn't have links from collections / containers automatically derived (but are still enabled) should be added to PLUGINS_ENABLED.

The second idea gave more variables to each plugin, like:

CLAIR_PLUGIN =  {   name: "clair_plugin"
                                   "collections": False,
                                   "containers": True,
                                   "users": False
                                }

PLUGINS_CONTAINERS = ( CLAIR_PLUGIN, )
PLUGINS_ENABLED = ('ldap_auth', 'clair_plugin')

And we parse through PLUGINS_ENABLED and the various views to determine rendering.

Let me know your thoughts! If you want I can give a first pass at an updated "How to add a plugin" doc, and then you can try to see how it works (or doesn't for Clair). Which of the approaches do you like? Is something missing?

dtrudg commented 7 years ago

@vsoch - I'll get back to you on this more next week, but the display stuff you are talking about seems roughly what I was thinking. On plugin types I don't think we should complicate the config file with extra lists. If we have different types PLUGINS_ENABLED should still just list the name, and the other stuff gets done automagically, from information contained in the plugins __init.py__

vsoch commented 7 years ago

ah great idea! Then we can have some definitive (or suggested) set of init.py settings / configs to help guide plugin makers for how to go about doing it.

Totally no worries on rushing this! It's important we do it right, and not in any time rush. I'm really excited and loving working on this! Looking forward to hearing the details next week!

vsoch commented 6 years ago

hey @dctrud ! It looks like this issue got turned into one about plugins, and we forgot about clair! I have some time now and think this would be good to work on - is it still of interest to you?

dtrudg commented 6 years ago

@vsoch - yes, still interested, but about to take a two week break from work can computers in general. Am using this happily at the moment to get the scan info I need, albeit in unfriendly text form: https://github.com/dctrud/clair-singularity

vsoch commented 6 years ago

okay! Let me take a shot at (at least) catching up to what you've done thus far. Enjoy the holiday!! :D :christmas_tree:

vsoch commented 6 years ago

@dctrud want to get your thoughts on a few things, now that I have a sense of clair!

It seems like there are two options here:

  1. Integration with sregistry as another docker-compose instance. This is logical given that a single registry goes up and down with compose, and is tightly coupled into "one webby thing." It also makes sense in the context of having clair runs be celery tasks that are run periodically. Under this model, we add the clair image as another container in docker-compose.yml, and since they live on the same server, there is an extra step that creates a duplicate of the image (in the .tar.gz format) to be scanned. We don't need clair to do any kind of pull to get it.

  2. External integration would mean having clear instructions for deploying a clair scanner via some external way (either a separate docker container, as you showed in your gist) or another server entirely (still likely with docker!) Then the integration part would be more like a webhook - you bring up the clair server, turn on some switch in Singularity Registry and give it the address for the server, and then the clair receives a notification when an image is updated (along with a URL to pull and test it). Then the .tar.gz of the image would be stored on this separate server to be tested regularly unless the image changes on the main server, in which case there is another webhook to update the file.

The first would likely be fine for small registries but (and I might have articulated this before) any kind of scaling might be challenging. Actually, now that I think of it - I think the webhook idea would be ideal! Imagine what kind of (other) cool things could be built with a simple way to send a notification to change events. Let's chat about the different kind of events we would want, and perhaps how to validate the hooks? My thinking is:

That simple list (I think) would give us a good start. Then the Clair integration would broadly include the addition of webhooks above, and some kind of worker that can receive the hook, manage the conversion, run it, and then send back a report. The "sending back a report" bit is the harder part, because we would arguably want the container security reports to appear with the containers, or have some action taken when a container is found to have a vulnerability (e.g disable pulling). Or maybe we want to keep them separate? How do you think that might look? Let's first chat about the strategy for implementation, and then I think we will hit some of these details.

dtrudg commented 6 years ago

Hi @vsoch, and happy Boxing Day (since I'm in England and enjoying mince pies and Christmas cake)

I think the no 2. webhook option sounds good, and also the 6 initial events should cover most things.

Regarding sending back a report, I'm not completely sure about a good way to do that. I was thinking maybe not bringing in the report to sregistry at all. Perhaps there's a plugin or other implementation on the sregistry web app that can a column to container summary tables, and a region to detail view - displaying content (badge or detail) that is served up by the external service. In the Clair case that'd mean the external service has to have it's own web app with an API, wrapping the main Clair scanner container and serving up badges and nicely formatted detailed reports. A bit more work, but more flexible for other types of external service that might need integration? Making it a bit like going out to Travis and CircleCI from GitHub?