rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.87k stars 857 forks source link

Adding Multiprocessing feature #998

Closed Ne-oL closed 1 year ago

Ne-oL commented 1 year ago

Code of Conduct

Description

This update to the code allows the plot of Decision Regions to be executed on all available cores, in addition there an option to implement it on cluster-wide nodes using the 'ray' parameter in the calling. I'm not a programmer by profession so I'm sure my implementation is not pythonic and probably needs a lot of work. so I hope the maintainer can modify or take inspiration of my implementation in case it can't be merged with the main branch as it is. I found that the speed bottleneck for the decision_regions drawing was due to the slow prediction. to fix it, I added parallelization and distributed the iterative prediction on all available CPUs (local ). I have tested the changes and so far there doesn't seem to be any issues. the speed gain i measured on a small dataset was 21X the original (449 seconds vs 21 seconds on local HPC instance of 36 cores). the large dataset >9,000 input took days(>5 days) to draw, now its taking ~12 minutes on a HPC Cluster (72CPUs). its not their intended goal but it basically addresses issues #801, #804 ### Related issues or pull requests Fixes #804, #801 ### Pull Request Checklist
rasbt commented 1 year ago

Wow thanks for this! I've been out of office and am a bit tied up right now, but I am looking forward to reviewing this PR. Sounds super awesome and promising. Thanks for submitting this!

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: -0.17 :warning:

Comparison is base (33b9a6c) 77.46% compared to head (806de15) 77.30%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #998 +/- ## ========================================== - Coverage 77.46% 77.30% -0.17% ========================================== Files 200 200 Lines 11287 11311 +24 Branches 1480 1486 +6 ========================================== Hits 8744 8744 - Misses 2324 2348 +24 Partials 219 219 ``` | [Impacted Files](https://codecov.io/gh/rasbt/mlxtend/pull/998?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka) | Coverage Δ | | |---|---|---| | [mlxtend/classifier/stacking\_cv\_classification.py](https://codecov.io/gh/rasbt/mlxtend/pull/998?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka#diff-bWx4dGVuZC9jbGFzc2lmaWVyL3N0YWNraW5nX2N2X2NsYXNzaWZpY2F0aW9uLnB5) | `100.00% <ø> (ø)` | | | [mlxtend/plotting/decision\_regions.py](https://codecov.io/gh/rasbt/mlxtend/pull/998?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka#diff-bWx4dGVuZC9wbG90dGluZy9kZWNpc2lvbl9yZWdpb25zLnB5) | `0.00% <0.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Ne-oL commented 1 year ago

No worries, i really appreciate the work you have done in this library. In all honesty, i copied 'decision_regions' content to my jupyter notebook, modified it, and have been using it as a method ever since. So i didn't do my due diligence in testing it as a package, But I will review the code again and troubleshoot any issues in the weekend. Hopefully, when you have the time, it would be in a much better shape. Thank you again for the awesome library.

Ne-oL commented 1 year ago

sorry i was busy and didn't have the time to work on it in the weekend. So it took a bit longer to finish it. here is a summary of what i have done in it:

As you can see, there are several opinionated choices in the code, however, as i mentioned before I'm not a programmer so my choices are just what seems logical to me (and not necessarily the best or the pythonic way) so please feel free to change it as you see fit.

rasbt commented 1 year ago

I am super sorry about this! I must have been out during the holidays when this PR landed, and I am just seeing this now! Thanks for your work on this, and I am definitely excited to merge this into mlxtend before the next version release!

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

rasbt commented 1 year ago

I think this was a bit messy since you made the changes in master, but no worries. This is a fantastic PR btw. Thanks a lot!

Ne-oL commented 1 year ago

Sorry about the noob mistake of making changes directly in master, its my first PR in all honesty. is there a way to easily rectify it? and sorry for troubling you with the linter and formatting issues. i didn't know how to do it and just forgot about it with my ongoing research. I'm unsure if there is anything extra on my end required to complete the merge or its all up to you now?

rasbt commented 1 year ago

No worries @Ne-oL it's all good now :). Thanks a for the PR!