Closed magsol closed 8 years ago
@magsol What about Z file ? because based on what we defined for Z matrix it could be a MxP matrix. considering our big data sample which has 170x39,850 dimensions, we can say that our P is 170 here, Thus our Z matrix would be 170xM. suppose that M = 100 then our Z matrix would be 170x100 and our D matrix would be 39,850x100 which are not correct. Should I change the Z dimensions to TxM instead of PxM ?
@magsol
The second point is in v[indices] :
Here also should we set the V vector like what we discussed in our meeting ? if yes in pyspark code after invoking the op_select
should we do as follows :
indices = op_selectTopR(v, R)
temp_v = np.zeros(v.shape)
temp_v[indices] = v[indices]
v = temp_v
but my problem is mostly in next part where we should broad cast the indices and vector. Here the question is _" should we broad cast the modified v and all it's indices or just some indices and vector elements which have been selected by _opselectTopR function ?
If we set the non-topR values to 0, we can just broadcast v
and not the indices. We do the full vector-matrix multiplication with the 0s in the correct elements.
As per our earlier discussions, we'll be making row-vs-column-wise operations a command-line flag (see ticket #52).
Please refer to the R1DL pseudocode.
We need to make sure operations are consistent. To that end, we define
P
is the number of observationsT
is the number of features (or the dimensionality of the observations)S
, the input matrix, should be observations-by-features, orP
byT
.Orthogonal to this are the primary steps the program takes.
S
at the start.v
is of lengthP
, which means this operation should bev = S * u_old
.u
is of lengthT
, which means this operation should beu_new = v * S
u
andv
should be performed in the orderouter(v, u_new)
to generate aP
byT
matrix of residuals to updateS
.