quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

P4: Deflation of matrix #46

Closed magsol closed 8 years ago

magsol commented 8 years ago

This step of the algorithm involves computing the outer product of two vectors u and v and subtracting that product off the distributed (RDD) matrix S.

This is tough, because multiplying u and v will result in a matrix with the same dimensions as S; thus, we cannot perform typical in-core multiplication of these vectors.

Instead, we can broadcast both vectors over the cluster and perform an element-wise subtraction using a single map.

  1. Broadcast u and v to the workers, e.g. sc.broadcast(u) and sc.broadcast(v).
  2. Run a map over the RDD.
  3. In each mapper, generate a new row vector by subtracting the current one from the corresponding row that would be generated through the outer product of u * v.
  4. Return the row vector to create a new, deflated distributed RowMatrix (RDD).
MOJTABAFA commented 8 years ago

@magsol I'm trying to test the code in thunder, now the problem is we already used the op_select from R1DL, So as It's clear we aimed to import a function from R1DL.py, for this purpose how should we add that python file in our library ? I already copied R1DL.py file in "../python2.7/lib/site-packages" but still the thunder cannot find it . should we do another things to use a pyfile as a module inside the script?

magsol commented 8 years ago

You want the --py-files option http://spark.apache.org/docs/latest/submitting-applications.html

MOJTABAFA commented 8 years ago

@LindberghLi xiang would you please analyze the new Z file: Z2.txt

XiangLi-Shaun commented 8 years ago

@MOJTABAFA

I have checked the file, while its dimension seems to be of D matrix (thus u vectors), its rows and columns have not been normalized (thus should not be D). I'll mark the places in the code where I think needing revise with regarding to the "transposed S" problem we have just discussed.

magsol commented 8 years ago

I still think we really need a script that does this comparison for us.

iPhone'd

On Dec 31, 2015, at 16:49, LindberghLi notifications@github.com wrote:

@MOJTABAFA

I have checked the file, while its dimension seems to be of D matrix (thus u vectors), its rows and columns have not been normalized (thus should not be D). I'll mark the places in the code where I think needing revise with regarding to the "transposed S" problem we have just discussed.

— Reply to this email directly or view it on GitHub.

magsol commented 8 years ago

@LindberghLi Any work on the marking the code where you think the problems are cropping up?