mxnet sparse data format

szilard commented 8 years ago

Motivation: I can't run mxnet on the 10M records airline set https://github.com/szilard/benchm-ml/issues/29 because model.matrix crashes out of RAM (on g2.8xlarge with 60GB or RAM - largest available for GPU instances).

Using Matrix::sparse.model.matrix to encode the categorical data would be great (uses <2GB RAM), but I get:

Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Strangely on the 1M dataset I get another error:

Error: io.cc:50: Seems X, y was passed in a Row major way, MXNetR adopts a column major convention.

szilard commented 8 years ago

@tqchen @hetong007 Is sparse representation on the roadmap? - see thread above (I know mxnet is very new, and I have to tell you I think it already looks pretty great).

tqchen commented 8 years ago

Yes, this is something we should look into, can you also open an issue on https://github.com/dmlc/mxnet/issues ? Thanks

szilard commented 8 years ago

Cool, I'll do it soon.

szilard / benchm-ml

mxnet sparse data format #30