packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
GNU General Public License v3.0
44 stars 10 forks source link

`RuntimeError` using modified features set #45

Closed dhondta closed 1 year ago

dhondta commented 1 year ago

From @smarbal :

Hello @dhondta, My reduced features set is ready but I can´'t test it right now due to this bug. Note that a bug also happens during the features computing phase when using a full dataset (with files) with a reduced features set :

┌──[user@packing-box]──[/mnt/share/experiments/exp-1]──[.]──[improve-visualization|+6…18]────────                                                             ────[172.17.0.3]──[11:33:41]────
$ model train upx-PE1 -a kmeans -f conf/features.conf 
00:00:03.745 [INFO] Selected algorithm: K-Means clustering
00:00:03.747 [INFO] Reference dataset:  upx-PE1(PE32,PE64)
00:00:03.748 [INFO] Computing features...
00:00:03.901 [WARNING] Bad expression: checksum == 0
00:00:03.901 [ERROR] name 'checksum' is not defined
00:00:03.905 [WARNING] Bad expression: size_of_headers == 512
00:00:03.905 [ERROR] name 'size_of_headers' is not defined
00:00:03.909 [WARNING] Bad expression: size_of_initializeddata >= 3 * 1024 * 1024
00:00:03.909 [ERROR] name 'size_of_initializeddata' is not defined
Traceback (most recent call last):
  File "/home/user/.opt/tools/model", line 121, in <module>
    getattr(name, args.command)(**vars(args))
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 527, in train
    if not self._prepare(**kw):
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 217, in _prepare
    __parse(ds.files.listdir(is_exe), False)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 201, in __parse
    self._data = self._data.append(exe.data, ignore_index=True)
  File "/usr/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/executable.py", line 28, in data
    return Features(self)
  File "/home/user/.local/lib/python3.10/site-packages/tinyscript/preimports/log.py", line 91, in _wrapper
    return f(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/features/__init__.py", line 153, in __init__
    for name in self:
RuntimeError: dictionary changed size during iteration
dhondta commented 1 year ago

@smarbal For the following lines :

00:00:03.901 [WARNING] Bad expression: checksum == 0
00:00:03.901 [ERROR] name 'checksum' is not defined
00:00:03.905 [WARNING] Bad expression: size_of_headers == 512
00:00:03.905 [ERROR] name 'size_of_headers' is not defined
00:00:03.909 [WARNING] Bad expression: size_of_initializeddata >= 3 * 1024 * 1024
00:00:03.909 [ERROR] name 'size_of_initializeddata' is not defined

You probably have MS-DOS executables in your dataset. When we compute the Features registry, pefeats only applies to PE32, PE64 and .NET (that's the expected behavior as pefeats fails to parse MS-DOS). Therefore, you get errors when trying to compute the features for the MS-DOS files of your dataset. The quick fix is to remove MS-DOS executables from your dataset.