Test improvement discussion thread

terriko commented 4 years ago

With the new checker setup, we have an opportunity to modernize our tests. This was previously discussed in #638, but this issue is to summarize where we're at now:

Current state:

Short tests include
- filename tests from test_filename_is in test_checkers.py
- mapping tests using test_files in test_scanner.py
Long tests include
- binary tests against real files from test_binaries in test_scanner.py

Problems:

filename tests
- ~~mostly don't exist since this was reserved as an "easy first commit" for new contributors~~ (Added by @Niraj-Kamdar when the checkers were updated)
mapping tests
- ~~are a huge disorganized parametrize array right now~~ Fixed by @SaurabhK122 in #675
- require a compiler to convert .c files to binaries for testing -
long tests
- rely on externally available binaries
- downloading binaries is slow
- binaries disappear
- large space requirements
- long tests take around half an hour in CI, and it's only going to get worse as we add more checkers.

Solution high level ideas:

use a struct with strings for mapping tests (see #638 for details)
pre-parse the long tests so we save structs of strings to use in place of each file.
- maybe also cut down the size of the struct to only strings likely to cause false positives somehow?
- store information about where we got the strings so we can re-generate if needed
Leave some sort of binary tests for the actual scanner functions, but we could probably generate and check in .out file(s) for necessary tests to avoid everyone needing a compiler.

Questions:

Question 1: Where should the test data be stored?
1. Test data lives in test files.
  - We might want to move it all to test_checkers instead of test_scanner if we're bypassing the scanner portion.
  - e.g.
    
    def test_bestlibrary() valid_filenames = ["libbest4.3.2.so", "best"] valid_strings = ["This is Best Library 1.1.1d"] valid_mappings = ["1.2.3", should_have=["CVE-123-1234", "CVE-123-1235"], should_not_have = ["CVE-123-1555"], ]`
    
    test_filenames("best", valid_filenames) test_strings("best", valid_filenames) test_mappings("best", valid_mappings)
2. Test data lives with the checker.
  - e.g. each checker would also store
    - valid_filenames = ["libbest4.3.2.so", "best"] # trigger filename tests
    - valid_strings = ["This is Best Library 1.1.1d"] # trigger mapping tests
3. Hybrid. Store some basic stuff (like valid filenames and a single test string) in the checker, leave longer stuff to the test suite.
Question 2: How do we pre-parse the long tests?
- put the original URL of the file used in a test docstring/comment and...
  1. Run strings, just store everything in array
  2. Run strings, store anythign that matches a version pattern or some product name patterns
  3. something else?
Question 3: What do we do with the old file tests?
1. Keep them and run them, but rearrange so they're easier to manage
2. Save them and run only once per release for validation.
3. Save them but disable them and only run the pre-parsed versions. Individuals/maintainers can run them for heavier testing on checker changes before they're merged.
4. Delete them.

terriko commented 4 years ago

My preferences: Question 1: Where should the test data be stored?

I'd rather have it all in test files. We should look at ways to group the tests by checker (I sort of pseudocoded one above). Maybe it would be better for us to have separate files that inherit from a test_checker class?

Question 2: How do we pre-parse the long tests?

I'm leaning towards some form of pre-parsing and pruning, but I'd like opinons on how to do the pruning.

Question 3: What do we do with the old file tests?

I'm leaning towards saving them for now, and either running them once per release or only when a checker changes for additional validation. (e.g. never running them in CI) But I'm willing to trend towards deleting them entirely if they tend to go stale between releases.

Niraj-Kamdar commented 4 years ago

For Question 1: we can put this data in separate json file in test directory. which we will load when we run tests.

terriko commented 4 years ago

@Niraj-Kamdar that sounds like a good solution

Niraj-Kamdar commented 4 years ago

For Question 1: even better we can use csv file to store info. Contributors can open it in excel and it would be easy to read and write.

terriko commented 4 years ago

Hm, that's an interesting thought. Previously, we've kind of assumed that people contributing checkers are fairly code-savvy (because they had to be) but with the new setup that's probably not as true. I worry that CSV isnt' the greatest solution for multi-line data, though.

That's got me thinking, though: If we're assuming a lower barrier to entry on checker writing, maybe we should start with the checker data being in pythonic arrays rather than json so that it gets covered by the Black formatting. A lot of our problem in the test cases right now is that having a huge number of beginner commits meant we weren't as careful about reviewing the alphabetization. The autoformatter might be especially valuable here. I'm sure there's an equivalent autoformatter for json we could use (and probably should eventually) but maybe we should start with what we have.

terriko commented 4 years ago

Another thought re: Question 1. How do people feel about having doctests?

https://docs.python.org/3/library/doctest.html

I feel like at least for filenames and version checking it might be really helpful. The mapping tests would probably be too unwieldly since the cve mapping happens elsewhere.

Niraj-Kamdar commented 4 years ago

I didn't really understand your concerns about csv but if you are worried about In/ Not_in arrays. We can flatten those like following. I know it won't look great if we edit it as text file but in excel it will look more human readable and we can also leverage excel to sort csv file for us.

package, version, are_in, not_in
cups, 1.2.4, CVE-2007-5849, CVE-2005-0206
cups, 1.2.4, CVE-2007-7892, CVE-2005-0990
cups, 1.2.4, ,CVE-2004-8272

terriko commented 4 years ago

heh. You clearly have not been stuck in enterprise america if you haven't seen people screw up spreadsheets. Flattening would help, but... I don't think we actually have any particular need to support anything other than straight python code, and we'll get a slight performance improvement if we don't have to keep parsing data. Let's just stick with keeping the tests directly in some form of python code unless there's a compelling reason to do additional pasing.

terriko commented 4 years ago

Update:

@SaurabhK122 has solved the "tests are a disorganized mess" problem in #675. Yay!
I'm going to experiment with setting up doctests this week if all goes well.
I think no one is working on the pre-parsing work at this time? But it seems like we agree that this is the way to go.
I filed #669 and #668 so that we remember to write the explicit tests that we're getting for "free" with the current compile setup.

terriko commented 4 years ago

Pre-parsing thoughts:

I took a quick look at our signatures, and right now the shortest ones are around 10 characters. Strings reports every string greater than 4 characters. if we up it to 8 or 10, we might be able to use that as a first-pass to do un-intelligent pre-parsing on the existing tests. I don't know if it'll reduce the size meaningfully enough yet; I'm going to run some more tests.

Niraj-Kamdar commented 4 years ago

I think we should not store parsed strings as python list because it will create very long list and a package contains many files. So, we will loose filename information. I propose We download a package if it isn't parsed and extract it using extractor. then, we parse every file of the extracted package with our strings module and save it with the same name and compress whole directory and store it on our repo. Here's the UML for it. long_test_parser_uml

Advantages of above system

developer don't need to provide any additional data.
parser run automatically if we don't have parsed version

terriko commented 4 years ago

Current status:

We need a script that converts .rpms etc into groups of strings (be careful to save filenames?)
We need a new test that reads those and runs the checkers
We need data on how big the converted packages are -- if they're really small, we might be able to store in tree. If they're not, we have a few ideas about how to shrink them down
We should investigate how to make these work well with github caching

terriko commented 3 years ago

I believe the remaining issues discussed here were fixed in #1036 . If anything wasn't, please feel free to open a new issue.

intel / cve-bin-tool

Test improvement discussion thread #665