yxsu / jpetuum

Java version of Petuum
8 stars 8 forks source link

Update the table A stored in the PS table using all the data in A in each thread #1

Open JunjieHu opened 9 years ago

JunjieHu commented 9 years ago

Suppose I have a table A stored in the PS table. I partition the table A into n x m blocks, where A(i,j) denotes the (i,j)-th block by rows and columns.

Each time I need to update just ONE block of the table A in each thread. Suppose a thread needs to update the block A(i,j), but it needs to use ALL the elements in A to calculate the update of block A(i,j). Thus, It needs to read every line of A, which I think would slow down the program. Do you have any idea to avoid reading all the data in A?

Different with Matrix Factorization: To update the i-th row of L and j-th column of R for each thread, MF doesn't need to use all the data in L and R table.

-Junjie

yxsu commented 9 years ago

You need read a whole line by "get", but you can write a shorter line in the block with "updateBatch" function.

The only way to reduce the number of unneeded element is to store each block as a line.

JunjieHu commented 9 years ago

Let me make it more clear. I mean that all the data in table A should be used in each iteration. For example,

void runnable(int localThreadId){ // I need to use all the value in table A to calculate the updated value of A wtw = v' * A * v
//**here, I need to use all the data in A by get() lines by lines, which is time-consuming. // different with MF, MF just needs to get L(i) and R(j) instead of the whole matrix

// perform update to block A(i,j) A(i,j) = A(i,j) + beta*wtw }

yxsu commented 9 years ago

I think you should avoid to update the whole table A in a single runnable function.