packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
GNU General Public License v3.0
49 stars 10 forks source link

No `predict` method for some clustering algorithms #47

Closed smarbal closed 1 year ago

smarbal commented 1 year ago

The following error occurs when training some clustering algorithms :

┌──[user@packing-box]──[/mnt/share]──[main|+2…6]────────                                                                                 ────[172.17.0.3]──[16:32:37]────
$ model train fs-upx-reduced-EP -a dbscan 
00:00:03.385 [INFO] Selected algorithm: Density-Based Spatial Clustering of Applications with Noise
00:00:03.387 [INFO] Reference dataset:  fs-upx-reduced-EP(PE32,PE64)
00:00:03.388 [INFO] Loading features...
00:00:03.429 [INFO] Making pipeline...
00:00:03.434 [INFO] Training model...
00:00:03.434 [INFO] (step 1/1) Processing dbscan
Traceback (most recent call last):
  File "/home/user/.opt/tools/model", line 120, in <module>
    getattr(name, args.command)(**vars(args))
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 594, in train
    self._train.predict = self.pipeline.predict(self._train.data)
  File "/home/user/.local/lib/python3.10/site-packages/pbox/learning/model.py", line 70, in __getattribute__
    return object.__getattribute__(object.__getattribute__(self, "pipeline"), name)
  File "/home/user/.local/lib/python3.10/site-packages/sklearn/utils/metaestimators.py", line 127, in __get__
    if not self.check(obj):
  File "/home/user/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 46, in check
    getattr(self._final_estimator, attr)
AttributeError: 'DBSCAN' object has no attribute 'predict'

This is because the sklearn model does not implement a predict() method. Instead, the fit_predict() method should be used.

dhondta commented 1 year ago

@smarbal The difficulty with this is that a few algorithms, either supervised or unsupervised, use the API with .fit(...) and .predict(...), as you can see for RandomForestClassifier or KMeans while DBSCAN doesn't. I will program an alternative workflow for this case.

dhondta commented 1 year ago

@smarbal I did not test yet, I must confess. If you have time for, please do...

smarbal commented 1 year ago

This works as intended, thank you.