packing-box / docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
GNU General Public License v3.0
49 stars 10 forks source link

Dataset plot issue (multiprocessing) #141

Closed jramhani closed 4 months ago

jramhani commented 4 months ago

I rebuilded the docker, and cloned a fresh new packingBox .. now i get this error (with multiprocessing again, weirdly)

$ dataset plot features upx_bl1 number_wx_sections
/home/user/.local/lib/python3.12/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
00:00:01.329 [INFO] Computing features...
  1% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   6/588 samples • 0:00:00 • 0:00:55 • upx_bl1
Traceback (most recent call last):
  File "/home/user/.opt/tools/dataset", line 215, in <module>
    getattr(ds, args.command)(**vars(args))
  File "/home/user/.local/lib/python3.12/site-packages/pbox/core/dataset/__init__.py", line 682, in plot
    self._compute_all_features(**kw)
  File "/home/user/.local/lib/python3.12/site-packages/pbox/core/dataset/__init__.py", line 173, in _compute_all_features
    for basename, features in p.track(pool.imap_unordered(self._compute_features_worker, self),
  File "/home/user/.local/lib/python3.12/site-packages/pbox/helpers/rendering.py", line 10, in track
    for value in (sequence if silent else super(CustomProgress, self).track(sequence, *args, **kwargs)):
  File "/home/user/.local/lib/python3.12/site-packages/rich/progress.py", line 1209, in track
    for value in sequence:
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 873, in next
    raise value
TypeError: void() takes at least 1 positional argument (0 given)

It might be in the Dataset class :

pbox/core/dataset/__init__.py

def _compute_all_features(self, n_jobs=None, **kw):
        """ Convenience function for computing the self._data pandas.DataFrame containing the feature values. """
        if self._files:
            self.logger.info("Computing features...")
            from multiprocessing import Pool
            with Pool(processes=n_jobs or config['number_jobs']) as pool:
                with progress_bar(target=self.basename) as p:
                    for basename, features in p.track(pool.imap_unordered(self._compute_features_worker, self),
                                                      total=len(self)):
                        self[basename] = (features, True)  # True: force updating the row