packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
GNU General Public License v3.0
49 stars 10 forks source link

Error in one feature computation causes other features to not be computed #124

Closed AlexVanMechelen closed 5 months ago

AlexVanMechelen commented 5 months ago

Issue

Some executables return a CFG with very few nodes. If the total extracted instruction length is smaller than the n-gram size n, then ngram_hist returns an empty list. Therefore, features relying on the ngram_hist, like zeropad(128, default=0)(binary['cfg']['ngram_hist'](3, True, False)[1])[0] result in an IndexError: list index out of range. This not only leads to empty ngram_hist related features, but all other features remain empty, also the non-cfg-based ones which do successfully compute otherwise.

Samples

/mnt/share/dataset-packed-pe/not-packed/IEExec.exe
/mnt/share/dataset-packed-pe/not-packed/AddInProcess32.exe
/mnt/share/dataset-packed-pe/not-packed/MacMakeup.exe
/mnt/share/dataset-packed-pe/not-packed/Updater5.exe