rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.82k stars 853 forks source link

Fix fpmax for sparse matrices #1000

Closed tkellogg closed 1 year ago

tkellogg commented 1 year ago

Description

While the current functionality supports sparse matrices on an API level, it calls pd.DataFrame.values in order to get the number of rows in the DataFrame. Unfortunately, .values forces the entire DataFrame to be converted into a non-sparse 2D numpy array. So using sparse matrices to fix an OOM doesn't actually make the OOM go away.

This fix is semantically identical, in Pandas terms, except that it doesn't materialize a dense array just to find it's length.

Related issues or pull requests

N/A

Pull Request Checklist

rasbt commented 1 year ago

Unfortunately, .values forces the entire DataFrame to be converted into a non-sparse 2D numpy array

Wow, good catch! Thanks for the PR

codecov[bot] commented 1 year ago

Codecov Report

Base: 77.45% // Head: 77.46% // Increases project coverage by +0.01% :tada:

Coverage data is based on head (29c97c0) compared to base (f248eb6). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1000 +/- ## ========================================== + Coverage 77.45% 77.46% +0.01% ========================================== Files 198 198 Lines 11171 11171 Branches 1406 1406 ========================================== + Hits 8652 8654 +2 + Misses 2305 2304 -1 + Partials 214 213 -1 ``` | [Impacted Files](https://codecov.io/gh/rasbt/mlxtend/pull/1000?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka) | Coverage Δ | | |---|---|---| | [mlxtend/frequent\_patterns/fpmax.py](https://codecov.io/gh/rasbt/mlxtend/pull/1000/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka#diff-bWx4dGVuZC9mcmVxdWVudF9wYXR0ZXJucy9mcG1heC5weQ==) | `91.20% <100.00%> (ø)` | | | [mlxtend/evaluate/counterfactual.py](https://codecov.io/gh/rasbt/mlxtend/pull/1000/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka#diff-bWx4dGVuZC9ldmFsdWF0ZS9jb3VudGVyZmFjdHVhbC5weQ==) | `100.00% <0.00%> (+6.89%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.