scikit-learn-contrib / lightning

Large-scale linear classification, regression and ranking in Python
https://contrib.scikit-learn.org/lightning/
1.73k stars 214 forks source link

ENH: allow SAG* objects to take a RowDataset as argument. #94

Closed fabianp closed 8 years ago

fabianp commented 8 years ago

This is important when one has the data in a custom Dataset format and doesn't want to convert to numpy arrays/sparse matrices, e.g. when data does not fit in memory.

The biggest change is to code get_auto_step_size to accept a Dataset object, which was previously coded for numpy/sparse matrices.

fabianp commented 8 years ago

@casotto might be interested by this

mblondel commented 8 years ago

This could also be useful to create a dataset which computes feature products on the fly (possibly combined with the hashing trick).

mblondel commented 8 years ago

http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf

Section 2 discusses how to use the hashing trick with feature pairs.

fabianp commented 8 years ago

good to merge then?

mblondel commented 8 years ago

good to merge then?

Yep.

BTW, what did you need this for? Just curious :)

mblondel commented 8 years ago

In the future, we might want to add additional helper functions such as add and dot , and if possible an efficient way to iterate over non-zero indices.