ogrisel / pygbm

Experimental Gradient Boosting Machines in Python with numba.
MIT License
183 stars 32 forks source link

Implement support for sparse feature data #26

Open ogrisel opened 6 years ago

ogrisel commented 6 years ago

For instance if all the data is passed as a scipy.sparse.csc_matrix (e.g. after one hot encoding).

Pandas as support for sparse features: http://pandas.pydata.org/pandas-docs/stable/sparse.html

In particular it has dedicated datastructure for 1D sparse data: SparseArray.

There is also: https://github.com/pydata/sparse and I believe the ecosystem will converge at some point. I would be in favor of leveraging the datastracture from Pandas to start with the most adopted solutions that allows for heterogeneously typed features (a fix of dense and sparse columns, categorical or numerical).