quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

rand_vct function #21

Closed MOJTABAFA closed 8 years ago

MOJTABAFA commented 8 years ago

@magsol

Based on today negotiation with Xiang , It seems that our reandvct function could be easily translated to a line instruction in python. Today I found that the reason why xiang used the RAND_MAX is only to normal the random number between(-1,1), and the RAND_MAX gives us the maximum possible random.However in python we have a random generator which gives us a number between 0 and 1 so we don't need the RAND_MAX . So, do you have any idea about how to change the "stat_randVCT" thanks.

magsol commented 8 years ago

What is stat_randVCT computing?

MOJTABAFA commented 8 years ago

@magsol Actually I've converted it line by line as : def stat_randVCT(vct_input, count_row ): myseed = RAND_MAX * random.random() for idx_row in range (count_row): vct_input[idx_row] = 2*(random.random() / (RAND_MAX + 1.0))-1

it seems that it will generate some random values between -1 and 1 for u vector

@LindberghLi : Am I right xiang ?

XiangLi-Shaun commented 8 years ago

As I explained earlier, we do not need this function for generating the random number (at least not this complicated), because in c the rand() function has the output between 0 and RAND_MAX (which is not feasible for our model), so I wrote the function to convert the output to -1 to 1. But the random.random() method in python has the output between 0 and 1 (which is feasible), or we could easily make it to -1 to 1. @MOJTABAFA you really need to think what all those functions are doing and refer to my pseudocode, rather than directly translate the lines in my c++ code.

@magsol

magsol commented 8 years ago

If the goal is to generate a random vector with 0 mean and unit variance (as the pseudocode says), this can be done easily:

import numpy as np
x = np.random.random(100)
x = (x - x.mean()) / x.std()

Doesn't matter what the range of the random number generator is; the last line will normalize the vector to have 0-mean, unit variance.

XiangLi-Shaun commented 8 years ago

@magsol as a clarification, we need the unit length (l-2 norm), rather than the unit variance. These two are not equivalent (even when the mean is 0) and is differed by number of elements.

magsol commented 8 years ago

Ah you are right, my confusion.

Just use the same sla.norm() logic as before.

On Wed, Nov 25, 2015 at 21:07 LindberghLi notifications@github.com wrote:

@magsol https://github.com/magsol as a clarification, we need the unit length (l-2 norm), rather than the unit variance. These two are not equivalent and is differed by number of elements.

— Reply to this email directly or view it on GitHub https://github.com/quinngroup/pyspark-dictlearning/issues/21#issuecomment-159774790 .

MOJTABAFA commented 8 years ago

@magsol you mean your code should be change into :

import numpy as np from numpy import linalg as sla x = np.random.random(100) x = (x - x.mean()) / sla.norm(x)

MOJTABAFA commented 8 years ago

out puts would be as : [ 0.03551994 0.01594881 0.03460392 -0.0746732 -0.00752084 0.07318538 -0.01409152 -0.07899736 -0.03651929 -0.07270972 0.08805686 -0.02360807 -0.02069175 0.04203834 -0.06384064 0.07731459 0.09060918 0.01628939 -0.02079642 -0.08641691 -0.00467374 0.00664094 0.08923932 -0.08294974 0.04480922 0.02840031 -0.04807477 0.10083753 0.04162838 -0.06878034 -0.01358638 -0.04771101 0.07562907 -0.01824321 -0.07452072 -0.06248551 -0.06547711 -0.01946673 0.03357896 -0.00867409 0.04949603 -0.078095 0.01277521 -0.08563109 0.04743097 -0.01854184 0.01310242 -0.01013395 0.0095422 0.0058783 -0.04594754 0.08088013 0.0110568 -0.01084716 0.02918186 -0.01373075 0.09416104 0.07396624 -0.02926449 0.01023752 -0.0127202 0.0142896 -0.0730411 -0.0278912 0.00171085 -0.04207635 -0.01940319 -0.02145024 -0.00112342 0.0274331 -0.05375385 -0.08173893 -0.05960357 -0.02117121 0.04687527 -0.02070303 0.05694133 0.02854981 0.03396535 0.02093238 0.08417697 0.06728906 0.03076218 -0.02976543 -0.03141107 -0.02294438 -0.02283221 0.01866632 -0.02752554 0.02527332 0.04427503 0.06063859 -0.05551457 -0.01118008 -0.05078002 0.02729091 0.03539697 0.10285661 -0.00132651 -0.0647055 ] [Finished in 0.5s]

MOJTABAFA commented 8 years ago

in this case most of the time the generated numbers are between -0.1 and +0.1, however, occasionally the values could be more than 0.1 or less than -0.1.

MOJTABAFA commented 8 years ago

@LindberghLi please check the above three comments ,are this random out put desired in your idea?

XiangLi-Shaun commented 8 years ago

currently the output is not of unit l-2 norm, the code should be:

x = (x - x.mean()) x = x / sla.norm(x)

MOJTABAFA commented 8 years ago

@LindberghLi

thanks, but what's the difference between your mentioned code and my code ?

x = np.random.random(100) x = (x - x.mean()) / sla.norm(x)

still the outputs are mostly in [-0.1,+0.1] . I mean this kind of out puts are correct ?

thanks.

XiangLi-Shaun commented 8 years ago

could you post the output?

MOJTABAFA commented 8 years ago

@LindberghLi

[ 0.09832688 -0.02040473 0.02738486 -0.0363205 0.01820481 -0.01511018 0.00850818 -0.05564701 -0.06840739 -0.06108096 -0.04004084 0.05906173 0.10915359 0.06600141 0.00609825 -0.0297483 0.01480897 -0.04339008 -0.05135455 0.07887796 -0.03644726 -0.06592085 -0.07369513 -0.06081114 -0.07535426 -0.00722965 -0.06795739 -0.00946846 -0.07845813 0.02838731 0.08122839 0.04107506 0.06546979 0.08405445 0.0508227 -0.03488699 0.0214938 0.04932626 -0.01639934 0.05786784 0.07197834 -0.06588392 -0.03445111 0.07736846 -0.05449369 0.06733942 0.05988861 -0.0325012 -0.06208948 0.04559102 0.09466845 0.03325675 0.01706473 0.0183485 -0.03004451 -0.04473086 -0.03866415 0.02532387 0.0610062 0.03790272 0.0881521 0.03128176 0.01592531 0.07005575 -0.07898337 -0.07271774 -0.0412937 -0.02838894 -0.03006732 0.03094221 0.04630121 -0.02674015 0.0237484 0.06020184 -0.01346322 -0.03857951 -0.07144529 -0.05249533 -0.07785283 0.09610857 -0.02561271 -0.03593169 0.08723996 0.07886607 -0.02040836 0.08062433 -0.02383257 -0.04896703 0.08364456 -0.06680343 -0.05311241 -0.00170316 -0.07322618 0.05427392 0.05896741 -0.03558496 -0.05655617 -0.04753905 -0.08190164 -0.06802389] [Finished in 0.5s]

XiangLi-Shaun commented 8 years ago

The output is not right, are you sure you have been following the procedure?

import numpy as np from numpy import linalg as sla x = np.random.random(100) x = (x - x.mean()) x = x / sla.norm(x)

MOJTABAFA commented 8 years ago

@LindberghLi I did the following : import numpy as np from numpy import linalg as sla x = np.random.random(100) x = (x - x.mean()) / sla.norm(x)

However there is no difference with your mentioned code, I already ran that code too , the answer is as follows : [ 0.01397131 -0.01160816 -0.05265727 0.03866572 -0.01645476 0.08325194 -0.11803957 -0.10673007 -0.05559012 0.02767274 -0.0661113 -0.14512182 -0.0067645 0.10720202 -0.09917398 -0.09731793 -0.06629394 0.07251171 0.16017651 -0.13594152 0.05045475 0.10502956 -0.14896338 -0.03871363 -0.11151628 0.01510344 0.09157133 -0.05050201 -0.09559871 0.09664996 -0.06860539 0.16028138 0.0126516 0.1012264 0.02489751 -0.03231539 0.16368335 0.00865726 -0.00067024 0.11036461 0.10803686 -0.06150751 0.09824492 0.01303202 -0.08824158 0.16938295 -0.11807805 -0.1155784 0.0384936 -0.14416773 -0.00056994 -0.13965506 -0.1364072 -0.02821911 0.14952093 0.06150571 -0.15367746 -0.10389123 -0.14494586 0.01496298 -0.02832725 -0.09547539 0.11287961 -0.06952172 -0.06352629 -0.08937357 0.15601137 0.16834175 0.14552653 0.01892126 0.00146 0.1340024 -0.09681039 0.11180743 -0.07329982 0.1014959 -0.04987375 -0.03336777 0.04294137 -0.12487522 0.00897264 0.17381436 -0.08279162 0.00262042 -0.1410245 -0.15258712 -0.09310202 0.07489392 -0.07194362 -0.04871598 0.16791106 0.05811589 0.17214072 0.14800187 -0.14532072 -0.09887269 0.09482788 0.15874922 0.11018884 0.05761107] [Finished in 0.5s]

XiangLi-Shaun commented 8 years ago

Yes the current result (starting from "0.01397131" is correct, you shall use the code I mentioned.

MOJTABAFA commented 8 years ago

ok , Thanks. So I'll close the ticket now .