Open bjkomer opened 10 years ago
This is a good question...
You know which pre-processing pipelines yield sparse features right?
How about making a choice between (a) a sparse-output pipeline + a reasonable classifier for sparse data (b) a dense-output pipeline + classif for dense data
On Tue, May 6, 2014 at 12:47 PM, Brent Komer notifications@github.comwrote:
For example, KNN has a distance metric parameter and some metrics cannot be used on sparse data (e.g. chebyshev). Need a nice way to prevent these from being selected when sparse data is used.
One way could be to have a separate search space defined for sparse and dense data. (i.e. knn() and knn_sparse()) Another option could be to have a sparse/dense flag that changes how the space is defined (i.e. knn( sparse=True ))
I'm leaning towards the second option.
— Reply to this email directly or view it on GitHubhttps://github.com/hyperopt/hyperopt-sklearn/issues/28 .
For example, KNN has a distance metric parameter and some metrics cannot be used on sparse data (e.g. chebyshev). Need a nice way to prevent these from being selected when sparse data is used.
One way could be to have a separate search space defined for sparse and dense data. (i.e. knn() and knn_sparse()) Another option could be to have a sparse/dense flag that changes how the space is defined (i.e. knn( sparse=True ))
I'm leaning towards the second option.