Closed dhondta closed 2 years ago
Hi @smarbal I adapted the code to be more precise for this error. Please retry and post the traceback.
Here's the new traceback :
model train test-dataset -a kmeans
00:00:02.276 [INFO] Selected algorithm: K-Means clustering
00:00:02.278 [INFO] Reference dataset: test-dataset(ELF64)
00:00:02.279 [INFO] Computing features...
Traceback (most recent call last):
File "/opt/tools/model", line 117, in <module>
getattr(m, args.command)(**vars(args))
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/model.py", line 521, in train
if not self._prepare(**kw):
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/model.py", line 220, in _prepare
__parse(ds.files.listdir(is_executable), False)
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/model.py", line 203, in __parse
self._features.update(exe.features)
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/executable.py", line 32, in features
return {n: f.description for n, f in Features.registry[self.format].items()}
File "/usr/lib/python3.8/functools.py", line 967, in __get__
val = self.func(instance)
File "/usr/local/lib/python3.8/dist-packages/pbox/common/executable.py", line 126, in format
raise ValueError("'%s' has signature '%s' which is not supported" % (self, self.filetype))
ValueError: '/root/.packing-box/datasets/test-dataset/files/06d986b913b685936b365565b5204867aa2d388cdec3c5d5f9810561c31fb8f9' has signature 'POSIX shell script executable (binary data)' which is not supported
For an unknown reason, it seems that there is a script that was included at the generation of the dataset, causing executable-related computation to fail as its format attribute is None
. I need to inspect the dataset generation workflow to prevent from adding files that have their format attribute set to None
.
@smarbal you can try dataset fix test-dataset
and retry your command.
@dhondta I ran dataset fix test-dataset
and ran into this issue after retrying the command :
# model train test-dataset -a kmeans
00:00:02.016 [INFO] Selected algorithm: K-Means clustering
00:00:02.017 [INFO] Reference dataset: test-dataset(ELF64)
00:00:02.018 [INFO] Computing features...
00:00:03.087 [WARNING] Bad expression: checksum == 0
00:00:03.087 [ERROR] name 'checksum' is not defined
Traceback (most recent call last):
File "/opt/tools/model", line 117, in <module>
getattr(m, args.command)(**vars(args))
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/model.py", line 521, in train
if not self._prepare(**kw):
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/model.py", line 220, in _prepare
__parse(ds.files.listdir(is_executable), False)
File "/usr/local/lib/python3.8/dist-packages/pbox/learning/model.py", line 203, in __parse
self._features.update(exe.features)
TypeError: 'NoneType' object is not iterable
dataset :
test-dataset 20 3MB yes ELF64 {11},upx{3},gzexe{1},midgetpack{1},upx-3.92{1},upx-3.94{1},upx-3.95{1},ward{1}
By creating a new dataset with only the UPX packer, I was able to train a model and didn't run into this issue.
@smarbal OK, this is expected when exe.format
is None
. This means that, even after having used the fix
command, there still remains at least one non-executable file in your dataset.
This part is fixed with e81b4eada878ae13aedc0a9e0199046238c95674.
I did not spot the issue for the dataset generation.
@smarbal You can try dataset fix test-dataset
again (after a pbox-update
, of course) and retry your command.
@dhondta Error occurs on dataset fix test-dataset
:
dataset fix test-dataset
Traceback (most recent call last):
File "/opt/tools/dataset", line 149, in <module>
getattr(ds, args.command)(**vars(args))
File "/usr/local/lib/python3.8/dist-packages/pbox/common/utils.py", line 147, in _wrapper
return f(s, *a, **kw)
File "/usr/local/lib/python3.8/dist-packages/pbox/common/dataset.py", line 339, in fix
if exe.format is None: # unsupported or bad format (e.g. Bash script)
AttributeError: 'Path' object has no attribute 'format'
@smarbal My bad. Once again, please.
@dhondta Worked well, thanks !
For a dataset composed of ELF files,
model train
produces this error :