Closed tkellogg closed 1 year ago
Unfortunately, .values forces the entire DataFrame to be converted into a non-sparse 2D numpy array
Wow, good catch! Thanks for the PR
Base: 77.45% // Head: 77.46% // Increases project coverage by +0.01%
:tada:
Coverage data is based on head (
29c97c0
) compared to base (f248eb6
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Description
While the current functionality supports sparse matrices on an API level, it calls
pd.DataFrame.values
in order to get the number of rows in the DataFrame. Unfortunately,.values
forces the entire DataFrame to be converted into a non-sparse 2D numpy array. So using sparse matrices to fix an OOM doesn't actually make the OOM go away.This fix is semantically identical, in Pandas terms, except that it doesn't materialize a dense array just to find it's length.
Related issues or pull requests
N/A
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend