quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

Consensus of row vs column -wise operations #51

Closed magsol closed 8 years ago

magsol commented 8 years ago

Please refer to the R1DL pseudocode.

We need to make sure operations are consistent. To that end, we define

Orthogonal to this are the primary steps the program takes.

MOJTABAFA commented 8 years ago

@magsol What about Z file ? because based on what we defined for Z matrix it could be a MxP matrix. considering our big data sample which has 170x39,850 dimensions, we can say that our P is 170 here, Thus our Z matrix would be 170xM. suppose that M = 100 then our Z matrix would be 170x100 and our D matrix would be 39,850x100 which are not correct. Should I change the Z dimensions to TxM instead of PxM ?

MOJTABAFA commented 8 years ago

@magsol
The second point is in v[indices] : Here also should we set the V vector like what we discussed in our meeting ? if yes in pyspark code after invoking the op_select should we do as follows :

indices = op_selectTopR(v, R)
            temp_v = np.zeros(v.shape)
            temp_v[indices] = v[indices]
            v = temp_v

but my problem is mostly in next part where we should broad cast the indices and vector. Here the question is _" should we broad cast the modified v and all it's indices or just some indices and vector elements which have been selected by _opselectTopR function ?

magsol commented 8 years ago

If we set the non-topR values to 0, we can just broadcast v and not the indices. We do the full vector-matrix multiplication with the 0s in the correct elements.

magsol commented 8 years ago

As per our earlier discussions, we'll be making row-vs-column-wise operations a command-line flag (see ticket #52).