Closed AlexVanMechelen closed 5 months ago
@AlexVanMechelen Not sure parallelism will significantly improve performance as feature extraction operations are essentially IO-bound and not CPU-bound, hence it may be more suitable to use multi-threading. Did you test your proposal and notice a significant increase in performance ?
@dhondta For the new CFG-based features CPU processing power forms the limiting factor.
Experiment with 100 samples of various packer categories + non-packed samples -> computation of 1 feature "number_of_nodes" 1) No multiprocessing 38:02 for 100 samples -> 0.044 sample/s 2) Multiprocessing with 32 CPU cores 9:22 for 100 samples -> 0.178 sample/s -> 4x increase compared to no multiprocessing
Note: This is so slow (4x increase while number of CPUs went 32x) due to the issue #106 -> the last few executables require WAY more extraction time due to this issue. The rate for the first 89 samples was:
0:57 for the first 89 samples -> 1.561 sample/s -> 35x increase compared to no multiprocessing
This PR adds: