Open gdbeck opened 3 years ago
Hi!
I want to first clarify the intent of the question. Let's say A
is a matrix (or DataFrame/sparse matrix) of m
rows by n
columns. The GLRM (assuming real-valued or boolean-valued data for simplicity) produces a matrix X
of m
rows by k
columns, and a matrix Y
of k
rows by n
columns, where k
is the rank.
It sounds like you have another dataset B
, of size p
rows by n
columns. B
's projection on the PCA components from A
would be a matrix of size p
rows by k
columns. In PCA with no missing values and centered data, this would be a matrix multiplication (B * Y' *<a diagonal matrix>
). However, that projection doesn't work with the structure of GLRM because that formula is only correct with a quadratic (least-squares) loss function.
With LowRankModels, the easiest way to do this is to fit another GLRM while holding Y
constant. You can do this like so:
loss = QuadLoss() # Or whatever loss you chose before
r_x = ZeroReg() # Or whichever regularizer you desired on X
r_y = [FixedLatentFeaturesConstraint(Y[:, i]) for i=1:size(Y, 2)]
n_comp = 1
glrm_b = GLRM(B, loss, r_x, r_y, n_comp)
X_b, Y, ch = fit!(glrm_b)
If you want to calculate a new Y
matrix instead of a new X
matrix, just keep r_y
to be whatever you used as r
, and define r_x = [FixedLatentFeaturesConstraint(X[:, i]) for i=1:size(X, 2)]
That works perfectly! Thank you very much for your help, and for replying so quickly!
Hi there, This looks a great package. I'm particularly interested in the ability to fit LRMs to datasets with missing data (or in my case, outliers that need to be masked). I have a quick question that may be pretty basic, but an answer would help me to apply the code to my own data. Apologies if I've missed something in the documentation. I'm also fairly new to Julia.
If I fit a PCA model to a set of training data A (following your example):
how do I then apply the same model to a new set of data
B
? I would like to keepX
fixed and obtain new valuesY_b
that give the best fit ofX
toB
. That is, I would like to project the observations inB
onto the PCA components found fromA
.There are other PCA packages in Julia that will do this (e.g., the
reconstruct
function inMultivariateStats
), but they don't seem to be able to handle missing data or sparse arrays.Thanks in advance! Any help is appreciated!