madeleineudell / LowRankModels.jl

LowRankModels.jl is a julia package for modeling and fitting generalized low rank models.
Other
190 stars 65 forks source link

Compute X given Y #52

Closed cstjean closed 8 years ago

cstjean commented 8 years ago

Once a model has been fit to a matrix A, is there any way to fit it to another matrix holding Y constant? For example, if factor analysis is part of a pipeline that ends with an SVM classifier, the cross-validation code should learn the feature matrix Y on the training set, and compute the data matrix X on the test set, given Y.

madeleineudell commented 8 years ago

Yes, you can use the FixedLatentFeaturesConstraint https://github.com/madeleineudell/LowRankModels.jl/blob/master/src/regularizers.jl#L157 as your regularizer on Y when fitting the second model:

ry = [FixedLatentFeaturesConstraint(Y[i]) for i=1:n]

Sorry that's not yet documented!

On Wed, Apr 20, 2016 at 12:02 PM, Cédric St-Jean notifications@github.com wrote:

Once a model has been fit to a matrix A, is there any way to fit it to another matrix holding Y constant? For example, if factor analysis is part of a pipeline that ends with an SVM classifier, the cross-validation code should learn the feature matrix Y on the training set, and compute the data matrix X on the test set, given Y.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/madeleineudell/LowRankModels.jl/issues/52

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115

cstjean commented 8 years ago

That worked, thanks! For reference:

ry_B = [LowRankModels.FixedLatentFeaturesConstraint(glrm.Y[:, i]) for i=1:size(glrm.Y, 2)]
glrm_B = GLRM(B,losses,rx,ry_B,k);
X_B, Y_B, ch = fit!(glrm_B);
Y_B == glrm.Y  # true

I'm looking for libraries to add to ScikitLearn.jl. Are you interested in supporting the scikit-learn interface? If so, I would make a PR like this.

madeleineudell commented 8 years ago

Yes, I'd be very happy to have LowRankModels included in ScikitLearn.jl. I'm not sure what the best interface would be; some people will want to be able to ask for, say, NMF or PCA or Robust PCA by name, whereas others may want to specify a more nuanced model.

If you want to go ahead and wrap it, I suggest starting your PR from the dataframe-ux branch, which will be merged into master in the next few weeks. There are a few (small) breaking changes to the interface.

On Thu, Apr 21, 2016 at 5:27 AM, Cédric St-Jean notifications@github.com wrote:

That worked, thanks! For reference:

ry_B = [LowRankModels.FixedLatentFeaturesConstraint(glrm.Y[:, i]) for i=1:size(glrm.Y, 2)] glrm_B = GLRM(B,losses,rx,ry_B,k); X_B, Y_B, ch = fit!(glrm_B); Y_B == glrm.Y # true

I'm looking for libraries to add to ScikitLearn.jl. Are you interested in supporting the scikit-learn interface? If so, I would make a PR like this https://github.com/davidavdav/GaussianMixtures.jl/pull/18.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/madeleineudell/LowRankModels.jl/issues/52#issuecomment-212894784

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115

cstjean commented 8 years ago

Scikitlearn needs to store all hyperparameters in the type to support clone, and GLRM is missing fit!'s params.

Option 2 is less intrusive, but it's more types to maintain and tell users about. Any preference? I like option 1 in general, but it's not a great match for your codebase.

madeleineudell commented 8 years ago

I aesthetically prefer keeping the model separate from the algorithmic parameters. So I would prefer making a new type if Scikitlearn needs to store all the hyperparameters in the type. The simplest option is probably to make a new type SkGLRM <: AbstractGLRM. I don't think the code would require too much extra tooling to make all of the GLRM functionality accessible to SkGLRM in that case.

PCA and NNMF need not be extra types; but there could be specialized functions to instantiate SkGLRMs corresponding to those specialized models.

Are there other problems with this approach?

Madeleine

On Sat, Apr 23, 2016 at 4:49 AM, Cédric St-Jean notifications@github.com wrote:

Scikitlearn needs to store all hyperparameters in the type to support clone, and GLRM is missing fit!'s params.

  • I can add a fit_params field to the type definition, with a default value that maintains the current behaviour. Then I'll define ScikitLearnBase.fit!(::GLRM, ::Matrix), transform(::GLRM, ::Matrix) etc. I'll also need to add some pure-kwargs constructors, like scikit does http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html .
  • Or I can create some brand new types, SkGLRM, PCA, NNMF, etc. that each contain a GLRM object

Option 2 is less intrusive, but it's more types to maintain and tell users about. Any preference? I like option 1 in general, but it's not a great match for your codebase.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/madeleineudell/LowRankModels.jl/issues/52#issuecomment-213726114

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115