intel / cve-bin-tool

The CVE Binary Tool helps you determine if your system includes known vulnerabilities. You can scan binaries for over 200 common, vulnerable components (openssl, libpng, libxml2, expat and others), or if you know the components used, you can get a list of known vulnerabilities associated with an SBOM or a list of components and versions.
https://cve-bin-tool.readthedocs.io/en/latest/
GNU General Public License v3.0
1.17k stars 448 forks source link

Empty report on mvn based java projects #4139

Open hmw42 opened 3 months ago

hmw42 commented 3 months ago

Hello,

I prepared a test project containing a rancid version of log4j. On a debian based machine, running cve-bin-tool against this test project, cve-bin-tool reports log4j nicely. If I prepare a docker image (debian, cve-bin-tool from github or installed via pip) to use cve-bin-tool from a CI process, and let cve-bin-tool run from that, it reports nothing. It almost feels like I missed something.

The debug log of cve-bin-tool seems to divert when it comes to processing the pom.xml file of the test project. This is the good case:


DEBUG    cve_bin_tool.VersionScanner - Scanning file:                                             version_scanner.py:208
         /tmp/tmp_welle/cve-bin-tool-mvimbbfq/RandomNumberGenerator.war.extracted/META-INF/maven/                       
         tutorial/RandomNumberGenerator/pom.xml                                                            
DEBUG    cve_bin_tool - Validating                                                                       validator.py:36
         /tmp/tmp_welle/cve-bin-tool-mvimbbfq/RandomNumberGenerator.war.extracted/META-INF/maven/                
         tutorial/RandomNumberGenerator/pom.xml against the schema in                                             
         /tmp/sboms/lib/python3.11/site-packages/cve_bin_tool/schemas/pom.xsd

In the bad case the validating step is missing, instead the next file is scanned. Any hints would be very much appreciated. If I need to provide more information. please feel free to ask.

All the best hmw

terriko commented 2 months ago

If you're able to share your pom.xml file (or point us to where it's found in someone else's repo) that would be very helpful for further debugging!

I suspect that you may be running into one of the issues we found here:

To summarize the relevant parts of that issue (which is really hard to read thanks to some email translation issue in github):

  1. the version data was stored in a variable instead of directly in the <version> section.
    • I think we need to fix cve-bin-tool to pre-parse the pom.xml a bit better to handle these.
    • No one is working on this yet, but patches are very much welcome!
  2. the productname didn't have an exact match in any CPE (the identifier used to look up vulnerabilities). This probably isn't the case for log4j specifically, but may affect other products in your pom.xml files.
    • We've got a GSoC contributor who will be working on solving this issue by adding improved support for PURL, another identifier used to lookup vulnerabilities. This work will include some way for us to add mappings from "what it's called in pom.xml -> what it's called in vulnerability lookup" (many of these are available already to be imported but we will likely be adding more as we find out what's needed)

I'm guessing that the problem is related to 1 above because if it was just number 2 you should have at least gotten a list of products (just no vulnerability information). But it's entirely possible that you've hit another case where we're not parsing pom.xml as expected.

If you're up for taking a look at the code to see what went wrong, the pom.xml parser is in parsers/java.py and starts here: https://github.com/intel/cve-bin-tool/blob/f4c7e91e8157b1d24eb81836c3da0b3c2558c197/cve_bin_tool/parsers/java.py#L68

It sounds like it needs some additional cases before it's going to work the way people expect, but other than the issue described in #4101 I'm not sure what we're missing, so if you've got any insights into what pom.xml files look like and can help us make a clear path for someone write the code, that would help a lot to getting this fixed sooner.

terriko commented 2 months ago

More specifically, is there a difference between the pom.xml file in the docker container vs the one on the debian system?

Re-reading this, I may be barking up the wrong tree, but it does seem like a pom.xml parser issue.

hmw42 commented 2 months ago

Terri,

thanks for looking into that. I saw the other ticket, but I kind of ruled it out, mainly because I can get a report in a different runtime environment. Today I stripped down pom.xml to a bare minimum, no success. Then I followed your advice and had a look into the source.

In version_scanner.py in scan_file() the show comes to a quick end, because pom.xml is neither an executable nor a linux kernel, thus scan_file() returns None and the control flow doesn't reach the java parser.

In is_executable() the very first thing is to look for the file utility, if that is not there one tries to mimic file's behaviour in pure Python code (I think) in is_binary(). Well, the file utility is missing in the docker environment, is_binary() comes to the right conclusion that pom.xml is no binary -> nothing else to be done.

On the other hand, if the file utility is installed, pom.xml is indeed in the list of valid binary formats + special files and is_executable() does not return False here. I'm not deep enough into cve-bin-tool to understand why the same logic applies to, for instance, PE binaries and pom.xml files. It looks like a 'quick fix' for an issue ;).

The control flow then falls to the last return True in is_executbale(), stating that pom.xml is a binary (which is kind of odd). In that case scan_file() doesn't return early and does its thing, thinking it's dealing with a binary.

So, deploying the file utility in the docker container gives the expected results.

All the best hmw

terriko commented 2 months ago

Hm, that sounds like a side effect of how the binary and language checkers were separate. I agree that it's probably the expected outcome, but maybe it shouldn't be? The way we changed things to allow all scanners to run on arbitrary directories should be possible to generalize and do even inside a container, it's just that we don't test cve-bin-tool against containers regularly now.

In the meantime, I can at least suggest some other tools that might work better for you: Trivy does vulnerability scans and was designed with docker containers in mind, so I'd hope that they could handle this case better than we can. You might also have luck with Tern which is a component analysis tool that is again, designed to work with containers.

Those two tools are part of why we haven't really focused on docker support in cve-bin-tool -- it seemed to me like there was good enough tools in the space already. But if there's gaps where we are useful, I'm happy to know about niches were we might help!