Closed Niraj-Kamdar closed 4 years ago
Nice job digging into this! :D
The following might be helpful (I added it to test/test_strings.py
to see what happened)
def test_mkbin(self):
with tempfile.NamedTemporaryFile() as fileobj:
fileobj.write(("bzip2-1.0.2").encode("ascii") + b"\n")
fileobj.seek(0)
strings = Strings()
strings.filename = fileobj.name
print(strings.parse().split("\n"))
# Prints:
# ['bzip2-1.0.2', '']
Next steps here are probably to add a new version of test_binaries
and _binary_test
in test_scanner.py
, have the new version also take the strings to have in the "binary" and then start moving tests from the old test_binaries
to the new function. Sound good?
Yeah, I will start working on this.
It's worth noting that I of course only tested that on our version of strings, not the real binary, it is probably worth cross checking that it works with the real strings
If we're still using that, I can't remember if it gets used at all
Yes, I have checked it. It's working fine with strings command too.
I have an idea which will make checker writing process easy and we can avoid writing test_scanner for short tests completely.
When we add new checker we add package and vendor name in it. So, we can get package, vendor name, guess contains strings and version regex from the checker. We can use cached db to find cves that should be in this version and cves that shouldn't be in this version. So, we have all the parameters needed for _binary_test.
To make it practical we have to either make list of guess_contains strings and version regex string global or use class whichever seems more appropriate.
@terriko, @pdxjohnny needs your opinions on it.
To make above idea practical, We can have a base checker like below and every checkers inherit it. If we do this in most of the case we just have to correctly initialize inherited class. Also, instead of specifying package, vendor pair in docstring, we can put it as class attribute if there isn't any particular reason for the above implementation.
class Checker:
def __init__(self):
contains = []
regex = []
vendor = ""
package = ""
def guess_contains(self, lines):
for line in lines:
if any(contain in line for contain in self.contains):
return True
return False
def get_version(self, lines, filename):
version_info = dict()
if "bzip" in filename.lower():
version_info["is_or_contains"] = "is"
elif self.guess_contains(lines):
version_info["is_or_contains"] = "contains"
if "is_or_contains" in version_info:
version_info["modulename"] = self.package
version_info["version"] = regex_find(lines, *self.regex)
return version_info
Here is an example of inherited class. In this case we don't have to add other methods although for special cases we can override the above methods.
class OpensslChecker(Checker):
def init(self):
contains = [
"part of OpenSSL",
"openssl.cnf",
"-DOPENSSL_"
]
regex = [
r"part of OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
r"OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
]
vendor = "openssl"
package = "openssl"
I'd say those properties would be better at the class level than the instance level. i.e.
class OpensslChecker(Checker):
CONTAINS = [
"part of OpenSSL",
"openssl.cnf",
"-DOPENSSL_"
]
REGEX = [
r"part of OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
r"OpenSSL ([01]+\.[0-9]+\.[0-9]+[a-z]*) ",
]
VENDOR_PACKAGE = [("openssl", "openssl",)]
Also, some checkers have multiple vendor package pairs, so we probably want a list of tuples.
@terriko, thoughts?
tuples are good. I'd say those are class properties and don't need an init, yes. And yes, we've been talking about inheritance for the checkers for quite some time, but I'd been mentally pushing it off until after 1.0 released. So now is the perfect time to talk about this.
I'm also in favour of removing the compiler dependency for tests, but one thing we'll lose is the early warning if someone writes a checker that's so general that it detects the version of gcc or glibc. I think we can write specific mocked up tests for those (and they'd be better anyhow because we could make it more explicit what's going wrong in the test messages), we just need to not forget to do that and put it in the short tests so people see it early.
I agree we should use class property and tuples seems fine.
Yes, we can make a test file which contains some of commonly found strings that looks like version string and test it against every checker and raise an error if checker catch false positive.
I have also scanned cli.py an I am highlighting here parts of the code which will and may require changes.
Update cli.py:79 load_checkers
Update cli.py:98 available_checkers
Update cli.py:179 scan_file
Remove cli.py:69 vendor_package_pairs
May affect cli.py:103 remove_skiplist
May affect cli.py:114 print_checkers
May affect cli.py:439 main
We also have to change test_checkers.py and test_scanner.py. We probably don't have to change test_cli.py because we are only using main function here( We aren't testing inner functions.)
I am thinking about putting common checker base class in checker module's __init__.py
and every other checker will inherit it. If you think this structure is okay then I will start working on it.
@Niraj-Kamdar Sounds good!
I have also scanned test_scanner.py & test_cli.py and I am highlighting here parts of the code which will and may require changes. |
action | what | line | function |
---|---|---|---|---|
remove: | BINARY_PATH var | test_scanner.py:21 | global | |
remove: | make all call |
test_scanner.py:37 | setup_class | |
add: | named temp files with binary strings | test_scanner.py:37 | setup_class | |
add: | delete temp files | test_scanner.py:54 | teardown_class | |
replace: | BINARY_PATH with new temp path | test_scanner.py:65 | scan_file | |
remove: | vendor_package_pairs method | cli.py:69 | - | |
remove: | test_vendor_package_pairs | test_scanner.py:67 | - | |
may affect: | self.scan_file() | test_scanner:100 | _binary_test | |
replace: | make call |
test_cli:213 | test_unknown_warning | |
replace: | make call |
test_cli:232 | test_quite_mode |
We won't need test_file
now. so, we can safely remove it.
Since, I am going to refactor as mentioned in issue #647 , I feel like I should do this with re-factorization. There are a couple of reasons for that:
Currently, scanner class do both job of finding versions and CVEs, In issue #647 I am going to refactor that in two parts version_scanner
and cve_scanner
. So, test_version_scanner
only test version_scanner
part while test_cve_scanner
will test: __is cve_scanner returning CVEs from the previous version?__
Here, cve_scanner
is independent from version_scanner
So, testing cve_scanner
for every checkers we make isn't really necessary and probably bad way of testing cve_scanner
. In case of cve_scanner
we should focus on corner cases like one mentioned in issue #662.
@terriko and @pdxjohnny what's your opinion on it.
New mapping test for test_version_scanner will only contain following attributes as parameter.
{
"version": "3.7.1", # correct version
"checker_name": "python",
"version_strings": ["3.7.1", "lib/python3.7"]
}
We will automatically generate filename from above info. While vendor-package pairs and contains string will be fetched from the respective checker class.
I am thinking about making a separate python file that just store this list of dictionary. So, It won't be this hard to navigate through code and we can also get benefit of black formatter.
I am going to drop CVEs in
and not in
tests from the version_scanner and include a few good corner tests for cve_scanner.
Compiler dependency removed! Next task is adding gcc false positive detection test:
mingw_strings = [
b"GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
b"GNU C17 8.1.0 -mtune=core2 -march=nocona -g -g -g -O2 -O2 -O2 -fno-ident -fbuilding-libgcc -fno-stack-protector"
b"../../../../../src/gcc-8.1.0/libgcc/libgcc2.c"
b"C:\mingw810\x86_64-810-posix-seh-rt_v6-rev0\build\gcc-8.1.0\x86_64-w64-mingw32\libgcc"
]
glibc_strings = [
b"GLIBC_2.2.5",
b"printf@@GLIBC_2.2.5",
b"__libc_start_main@@GLIBC_2.2.5"
b"__cxa_finalize@@GLIBC_2.2.5"
]
gcc_strings = [
b"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0",
]
Currently, we are compiling c files to produce binary files that contains strings which can be found in the package. Most of the strings generated by compiling c file is just the compiler dump which we are ignoring anyway. So, why don't we use struct(as mentioned by @pdxjohnny) or plain binary strings which will save time and space. I was experimenting on struct and I found out binary file produced by using struct is same as we generate from just writing binary strings. Here is the script I used for this experiment.
Here,
struct.out
is same asbytestrings.out
!