intel / scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
https://intel.github.io/scikit-learn-intelex/
Apache License 2.0
1.19k stars 170 forks source link

Memory leak using RandomForestClassifier and PCA #1881

Open cannolis opened 1 month ago

cannolis commented 1 month ago

Describe the bug I am encountering a persistent memory leak when using RandomForestClassifier and PCA from the sklearnex library. With each iteration of my loop, the memory usage increases by approximately 20MB, which significantly impacts performance during large-scale data processing.

To Reproduce Steps to reproduce the behavior:

  1. Setup the environment with sklearnex installed.
  2. Initialize and configure RandomForestClassifier and PCA.
  3. Run a loop where RandomForestClassifier and PCA are used on the data.
  4. Observe the memory usage growth with each iteration.

Expected behavior I expect the memory usage to remain stable or return to the baseline after each iteration, ensuring efficient performance during large-scale data processing.

Environment: • OS: Windows 10 • Compiler: PyCharm • Version: 2024.1.2 Professional Edition

samir-nasibli commented 1 month ago

Hi @cannolis thank you for the report! Please share more details about env your have, version of scikit-learn-intelex, daal4py

cannolis commented 1 month ago

Hi @samir-nasibli

Here are the details about my environment:

Python version: 3.9.19 scikit-learn-intelex version: 2024.4.0 daal4py version: 2024.4.0 scikit-learn version: 1.3.0

Thank you for looking into this issue. I appreciate your help and support. If you need any further information, please let me know.

md-shafiul-alam commented 3 weeks ago

Hi @cannolis, thank you for raising the issue. Can you please provide a reproducer for your specific case? My initial investigation based on your your description doesn't show anything noticeable.