quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

Add some toy test data #4

Closed magsol closed 8 years ago

magsol commented 8 years ago

Need some test data to use as testing input for the prototype.

MOJTABAFA commented 8 years ago

Dear Xiang Whats the expected random range in following instruction of program ? unsigned int myseed = (unsigned int) RAND_MAX * rand();

XiangLi-Shaun commented 8 years ago

The range is (-1, 1) (note that it's open range). But the instruction:

unsigned int myseed = (unsigned int) RAND_MAX * rand();

is for generating the seed. Thanks.

MOJTABAFA commented 8 years ago

Thanks , since there is no constant in Python for RAND_MAx , I already considered RAND_MAX as a constant value as : RAND_MAX = 2147483647 and I converted your code as :

def stat_randVCT(vct_input, count_row ): myseed = RAND_MAX * random.random() for idx_row in range (count_row): vct_input = 2*(random.random() / (RAND_MAX + 1.0))-1

MOJTABAFA commented 8 years ago

Dear xiang : in numpy module of python we can use 2 simple instructions for multiplying a vector on a matrix or multiplying 2 matrices . and we can easily code (op_vctbymtx and Op_mtxbyvct ) as follows : "v = np.dot(S,u_old)" "u_new = np.multiply(S,v)" or "u_new = np.dot(S,v)"

but I need to know more about the role of R , idxs_n and op_selectTopR . please explain me more about them,Thus I can convert them immediately after your explanations .

specially please explain me your goal about following instruction : op_selectTopR( v, P, idxs_n, R )

Thanks again for your all cooperations

XiangLi-Shaun commented 8 years ago

op_selectTopR( v, P, idxs_n, R ) will select the top R number of elements from the input vector v (sparse vector with length P). The selected indices will be stored in idxs_n.