Remove compiler dependency while testing test_scanner and use more native solutions like struct or just writelines in byte format.

intel / cve-bin-tool

The CVE Binary Tool helps you determine if your system includes known vulnerabilities. You can scan binaries for over 200 common, vulnerable components (openssl, libpng, libxml2, expat and others), or if you know the components used, you can get a list of known vulnerabilities associated with an SBOM or a list of components and versions.

https://cve-bin-tool.readthedocs.io/en/latest/

GNU General Public License v3.0

1.25k stars 466 forks source link

Remove compiler dependency while testing test_scanner and use more native solutions like struct or just writelines in byte format. #638

Closed Niraj-Kamdar closed 4 years ago

Niraj-Kamdar commented 4 years ago

Currently, we are compiling c files to produce binary files that contains strings which can be found in the package. Most of the strings generated by compiling c file is just the compiler dump which we are ignoring anyway. So, why don't we use struct(as mentioned by @pdxjohnny) or plain binary strings which will save time and space. I was experimenting on struct and I found out binary file produced by using struct is same as we generate from just writing binary strings. Here is the script I used for this experiment.

import struct

A = [b"This program is designed to test the cve-bin-tool checker.\n",
     b"It outputs a few strings normally associated with bzip2-1.0.2\n",
     b"They appear below this line.\n",
     b"------------------\n",
     b"bzip2-1.0.2\n"]

d = dict(map(lambda x: (len(x), x), A))
fmt = ("{}s"*len(A)).format(*d.keys())

s = struct.pack(fmt, *d.values())

with open("struct.out", "wb") as f:
    f.write(s)

with open("bytestrings.out", "wb") as f:
    f.writelines(A)

Here, struct.out is same as bytestrings.out!

johnandersen777 commented 4 years ago

Nice job digging into this! :D

The following might be helpful (I added it to test/test_strings.py to see what happened)

    def test_mkbin(self):
        with tempfile.NamedTemporaryFile() as fileobj:
            fileobj.write(("bzip2-1.0.2").encode("ascii") + b"\n")
            fileobj.seek(0)
            strings = Strings()
            strings.filename = fileobj.name
            print(strings.parse().split("\n"))
            # Prints:
            # ['bzip2-1.0.2', '']

Next steps here are probably to add a new version of test_binaries and _binary_test in test_scanner.py, have the new version also take the strings to have in the "binary" and then start moving tests from the old test_binaries to the new function. Sound good?

Niraj-Kamdar commented 4 years ago

Yeah, I will start working on this.

johnandersen777 commented 4 years ago

It's worth noting that I of course only tested that on our version of strings, not the real binary, it is probably worth cross checking that it works with the real strings

johnandersen777 commented 4 years ago

If we're still using that, I can't remember if it gets used at all

Niraj-Kamdar commented 4 years ago

Yes, I have checked it. It's working fine with strings command too.

Niraj-Kamdar commented 4 years ago

I have an idea which will make checker writing process easy and we can avoid writing test_scanner for short tests completely.
When we add new checker we add package and vendor name in it. So, we can get package, vendor name, guess contains strings and version regex from the checker. We can use cached db to find cves that should be in this version and cves that shouldn't be in this version. So, we have all the parameters needed for _binary_test.
To make it practical we have to either make list of guess_contains strings and version regex string global or use class whichever seems more appropriate.

@terriko, @pdxjohnny needs your opinions on it.

Niraj-Kamdar commented 4 years ago

To make above idea practical, We can have a base checker like below and every checkers inherit it. If we do this in most of the case we just have to correctly initialize inherited class. Also, instead of specifying package, vendor pair in docstring, we can put it as class attribute if there isn't any particular reason for the above implementation.

class Checker:
    def __init__(self):
        contains = []
        regex = []
        vendor = ""
        package = ""

    def guess_contains(self, lines):
        for line in lines:
            if any(contain in line for contain in self.contains):
                return True
        return False

    def get_version(self, lines, filename):
        version_info = dict()
        if "bzip" in filename.lower():
            version_info["is_or_contains"] = "is"

        elif self.guess_contains(lines):
            version_info["is_or_contains"] = "contains"

        if "is_or_contains" in version_info:
            version_info["modulename"] = self.package
            version_info["version"] = regex_find(lines, *self.regex)

        return version_info

Here is an example of inherited class. In this case we don't have to add other methods although for special cases we can override the above methods.

class OpensslChecker(Checker):
    def init(self):
        contains = [
            "part of OpenSSL",
            "openssl.cnf",
            "-DOPENSSL_"
        ]
        regex = [
            r"part of OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
            r"OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
        ]
        vendor = "openssl"
        package = "openssl"

johnandersen777 commented 4 years ago

I'd say those properties would be better at the class level than the instance level. i.e.

class OpensslChecker(Checker):
    CONTAINS = [
        "part of OpenSSL",
        "openssl.cnf",
        "-DOPENSSL_"
    ]
    REGEX = [
        r"part of OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
        r"OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
    ]
    VENDOR_PACKAGE = [("openssl", "openssl",)]

Also, some checkers have multiple vendor package pairs, so we probably want a list of tuples.

@terriko, thoughts?

terriko commented 4 years ago

tuples are good. I'd say those are class properties and don't need an init, yes. And yes, we've been talking about inheritance for the checkers for quite some time, but I'd been mentally pushing it off until after 1.0 released. So now is the perfect time to talk about this.

I'm also in favour of removing the compiler dependency for tests, but one thing we'll lose is the early warning if someone writes a checker that's so general that it detects the version of gcc or glibc. I think we can write specific mocked up tests for those (and they'd be better anyhow because we could make it more explicit what's going wrong in the test messages), we just need to not forget to do that and put it in the short tests so people see it early.

Niraj-Kamdar commented 4 years ago

I agree we should use class property and tuples seems fine.

Yes, we can make a test file which contains some of commonly found strings that looks like version string and test it against every checker and raise an error if checker catch false positive.

Niraj-Kamdar commented 4 years ago

I have also scanned cli.py an I am highlighting here parts of the code which will and may require changes.

Update      cli.py:79  load_checkers 
Update      cli.py:98  available_checkers 
Update      cli.py:179 scan_file
Remove      cli.py:69  vendor_package_pairs
May affect  cli.py:103 remove_skiplist
May affect  cli.py:114 print_checkers
May affect  cli.py:439 main

We also have to change test_checkers.py and test_scanner.py. We probably don't have to change test_cli.py because we are only using main function here( We aren't testing inner functions.)

Niraj-Kamdar commented 4 years ago

I am thinking about putting common checker base class in checker module's __init__.py and every other checker will inherit it. If you think this structure is okay then I will start working on it.

johnandersen777 commented 4 years ago

@Niraj-Kamdar Sounds good!

Niraj-Kamdar commented 4 years ago

I have also scanned `test_scanner.py` & `test_cli.py` and I am highlighting here parts of the code which will and may require changes.	action	what	line
remove:	BINARY_PATH var	test_scanner.py:21	global
remove:	`make all` call	test_scanner.py:37	setup_class
add:	named temp files with binary strings	test_scanner.py:37	setup_class
add:	delete temp files	test_scanner.py:54	teardown_class
replace:	BINARY_PATH with new temp path	test_scanner.py:65	scan_file
remove:	vendor_package_pairs method	cli.py:69	-
remove:	test_vendor_package_pairs	test_scanner.py:67	-
may affect:	self.scan_file()	test_scanner:100	_binary_test
replace:	`make` call	test_cli:213	test_unknown_warning
replace:	`make` call	test_cli:232	test_quite_mode

We won't need test_file now. so, we can safely remove it.

Niraj-Kamdar commented 4 years ago

Since, I am going to refactor as mentioned in issue #647 , I feel like I should do this with re-factorization. There are a couple of reasons for that:

Currently, scanner class do both job of finding versions and CVEs, In issue #647 I am going to refactor that in two parts version_scanner and cve_scanner. So, test_version_scanner only test version_scanner part while test_cve_scanner will test: __is cve_scanner returning CVEs from the previous version?__

Here, cve_scanner is independent from version_scanner So, testing cve_scanner for every checkers we make isn't really necessary and probably bad way of testing cve_scanner. In case of cve_scanner we should focus on corner cases like one mentioned in issue #662.

@terriko and @pdxjohnny what's your opinion on it.

Niraj-Kamdar commented 4 years ago

New mapping test for test_version_scanner will only contain following attributes as parameter.

{
  "version": "3.7.1",  # correct version
  "checker_name": "python", 
  "version_strings": ["3.7.1", "lib/python3.7"] 
}

We will automatically generate filename from above info. While vendor-package pairs and contains string will be fetched from the respective checker class.

I am thinking about making a separate python file that just store this list of dictionary. So, It won't be this hard to navigate through code and we can also get benefit of black formatter.

I am going to drop CVEs in and not in tests from the version_scanner and include a few good corner tests for cve_scanner.

Niraj-Kamdar commented 4 years ago

Compiler dependency removed! Next task is adding gcc false positive detection test:

mingw_strings = [
    b"GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
    b"GNU C17 8.1.0 -mtune=core2 -march=nocona -g -g -g -O2 -O2 -O2 -fno-ident -fbuilding-libgcc -fno-stack-protector"
    b"../../../../../src/gcc-8.1.0/libgcc/libgcc2.c"
    b"C:\mingw810\x86_64-810-posix-seh-rt_v6-rev0\build\gcc-8.1.0\x86_64-w64-mingw32\libgcc"
]
glibc_strings = [
    b"GLIBC_2.2.5",
    b"printf@@GLIBC_2.2.5",
    b"__libc_start_main@@GLIBC_2.2.5"
    b"__cxa_finalize@@GLIBC_2.2.5"
]

gcc_strings = [
    b"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0",
]