sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
941 stars 218 forks source link

New end-to-end regression test module #1735

Open lfcnassif opened 1 year ago

lfcnassif commented 1 year ago

This is a very old idea. We could have a new iped-tests module to run full processing on some small public images (downloaded on demand and cached, or maybe copied to this repo) and check some basic processing results, failing the build if not good:

I usually run most of above tests manually on some real world evidences before each major or feature release, and basic tests before fixing releases. That consumes a lot of time...

Running those tests on small public standard images automatically and failing the build if results are not OK should help a lot to detect unintentional regressions, including when we upgrade critical dependencies (sleuthkit, tika, lucene).

Additionally, we can produce diff reports of above stats between two different versions (a stable one and a release candidate) after running on large real world regression datasets to be informed by the developer/tester. This usually shows divergences between versions that should be checked manually, because they can be bad but they can also be good: some files can move from one category to a better one, less carved files because more garbage is discarded or because of categorization differences, less search results because of less parsing exceptions (causing strings parser to run on less files)...

Maybe our new intern can help us to bring this old wish to life.

lfcnassif commented 1 year ago

The diff reports actually is an idea borrowed from Tika project and maybe we can use a similar approach: https://cwiki.apache.org/confluence/display/tika/TikaEval