Closed magsol closed 8 years ago
@magsol I'm trying to test the code in thunder, now the problem is we already used the op_select from R1DL, So as It's clear we aimed to import a function from R1DL.py, for this purpose how should we add that python file in our library ? I already copied R1DL.py file in "../python2.7/lib/site-packages" but still the thunder cannot find it . should we do another things to use a pyfile as a module inside the script?
You want the --py-files
option http://spark.apache.org/docs/latest/submitting-applications.html
@MOJTABAFA
I have checked the file, while its dimension seems to be of D matrix (thus u vectors), its rows and columns have not been normalized (thus should not be D). I'll mark the places in the code where I think needing revise with regarding to the "transposed S" problem we have just discussed.
I still think we really need a script that does this comparison for us.
iPhone'd
On Dec 31, 2015, at 16:49, LindberghLi notifications@github.com wrote:
@MOJTABAFA
I have checked the file, while its dimension seems to be of D matrix (thus u vectors), its rows and columns have not been normalized (thus should not be D). I'll mark the places in the code where I think needing revise with regarding to the "transposed S" problem we have just discussed.
— Reply to this email directly or view it on GitHub.
@LindberghLi Any work on the marking the code where you think the problems are cropping up?
This step of the algorithm involves computing the outer product of two vectors
u
andv
and subtracting that product off the distributed (RDD) matrixS
.This is tough, because multiplying
u
andv
will result in a matrix with the same dimensions asS
; thus, we cannot perform typical in-core multiplication of these vectors.Instead, we can broadcast both vectors over the cluster and perform an element-wise subtraction using a single
map
.u
andv
to the workers, e.g.sc.broadcast(u)
andsc.broadcast(v)
.map
over the RDD.u * v
.