New end-to-end regression test module

This is a very old idea. We could have a new iped-tests module to run full processing on some small public images (downloaded on demand and cached, or maybe copied to this repo) and check some basic processing results, failing the build if not good:

Number of allocated files, deleted (non-carved) files, file paths, MACB times, content hashes;
Number of file recognized per per mime-type/signature and per category;
Number of carved files per mime-type and their hashes;
Number of subitems extracted per parent container, their hashes and metadata (from the container like internal paths and MACB)
Extracted metadata (exif, office, emails headers, etc) keys and values for some (or all since this will be automated) files;
Numer of thumbnails generated per format, parsing exceptions per format, and other default filters counts;
Number of search results per format after searching for common words: prepositions, conjunctions, nouns. This evaluates content parsing and indexing at the same time;

I usually run most of above tests manually on some real world evidences before each major or feature release, and basic tests before fixing releases. That consumes a lot of time...

Running those tests on small public standard images automatically and failing the build if results are not OK should help a lot to detect unintentional regressions, including when we upgrade critical dependencies (sleuthkit, tika, lucene).

Additionally, we can produce diff reports of above stats between two different versions (a stable one and a release candidate) after running on large real world regression datasets to be informed by the developer/tester. This usually shows divergences between versions that should be checked manually, because they can be bad but they can also be good: some files can move from one category to a better one, less carved files because more garbage is discarded or because of categorization differences, less search results because of less parsing exceptions (causing strings parser to run on less files)...

Maybe our new intern can help us to bring this old wish to life.

sepinf-inc / IPED

New end-to-end regression test module #1735