Closed MOJTABAFA closed 8 years ago
What is stat_randVCT
computing?
@magsol Actually I've converted it line by line as : def stat_randVCT(vct_input, count_row ): myseed = RAND_MAX * random.random() for idx_row in range (count_row): vct_input[idx_row] = 2*(random.random() / (RAND_MAX + 1.0))-1
it seems that it will generate some random values between -1 and 1 for u vector
@LindberghLi : Am I right xiang ?
As I explained earlier, we do not need this function for generating the random number (at least not this complicated), because in c the rand() function has the output between 0 and RAND_MAX (which is not feasible for our model), so I wrote the function to convert the output to -1 to 1. But the random.random() method in python has the output between 0 and 1 (which is feasible), or we could easily make it to -1 to 1. @MOJTABAFA you really need to think what all those functions are doing and refer to my pseudocode, rather than directly translate the lines in my c++ code.
@magsol
If the goal is to generate a random vector with 0 mean and unit variance (as the pseudocode says), this can be done easily:
import numpy as np
x = np.random.random(100)
x = (x - x.mean()) / x.std()
Doesn't matter what the range of the random number generator is; the last line will normalize the vector to have 0-mean, unit variance.
@magsol as a clarification, we need the unit length (l-2 norm), rather than the unit variance. These two are not equivalent (even when the mean is 0) and is differed by number of elements.
Ah you are right, my confusion.
Just use the same sla.norm() logic as before.
On Wed, Nov 25, 2015 at 21:07 LindberghLi notifications@github.com wrote:
@magsol https://github.com/magsol as a clarification, we need the unit length (l-2 norm), rather than the unit variance. These two are not equivalent and is differed by number of elements.
— Reply to this email directly or view it on GitHub https://github.com/quinngroup/pyspark-dictlearning/issues/21#issuecomment-159774790 .
@magsol you mean your code should be change into :
import numpy as np from numpy import linalg as sla x = np.random.random(100) x = (x - x.mean()) / sla.norm(x)
out puts would be as : [ 0.03551994 0.01594881 0.03460392 -0.0746732 -0.00752084 0.07318538 -0.01409152 -0.07899736 -0.03651929 -0.07270972 0.08805686 -0.02360807 -0.02069175 0.04203834 -0.06384064 0.07731459 0.09060918 0.01628939 -0.02079642 -0.08641691 -0.00467374 0.00664094 0.08923932 -0.08294974 0.04480922 0.02840031 -0.04807477 0.10083753 0.04162838 -0.06878034 -0.01358638 -0.04771101 0.07562907 -0.01824321 -0.07452072 -0.06248551 -0.06547711 -0.01946673 0.03357896 -0.00867409 0.04949603 -0.078095 0.01277521 -0.08563109 0.04743097 -0.01854184 0.01310242 -0.01013395 0.0095422 0.0058783 -0.04594754 0.08088013 0.0110568 -0.01084716 0.02918186 -0.01373075 0.09416104 0.07396624 -0.02926449 0.01023752 -0.0127202 0.0142896 -0.0730411 -0.0278912 0.00171085 -0.04207635 -0.01940319 -0.02145024 -0.00112342 0.0274331 -0.05375385 -0.08173893 -0.05960357 -0.02117121 0.04687527 -0.02070303 0.05694133 0.02854981 0.03396535 0.02093238 0.08417697 0.06728906 0.03076218 -0.02976543 -0.03141107 -0.02294438 -0.02283221 0.01866632 -0.02752554 0.02527332 0.04427503 0.06063859 -0.05551457 -0.01118008 -0.05078002 0.02729091 0.03539697 0.10285661 -0.00132651 -0.0647055 ] [Finished in 0.5s]
in this case most of the time the generated numbers are between -0.1 and +0.1, however, occasionally the values could be more than 0.1 or less than -0.1.
@LindberghLi please check the above three comments ,are this random out put desired in your idea?
currently the output is not of unit l-2 norm, the code should be:
x = (x - x.mean()) x = x / sla.norm(x)
@LindberghLi
thanks, but what's the difference between your mentioned code and my code ?
x = np.random.random(100) x = (x - x.mean()) / sla.norm(x)
still the outputs are mostly in [-0.1,+0.1] . I mean this kind of out puts are correct ?
thanks.
could you post the output?
@LindberghLi
[ 0.09832688 -0.02040473 0.02738486 -0.0363205 0.01820481 -0.01511018 0.00850818 -0.05564701 -0.06840739 -0.06108096 -0.04004084 0.05906173 0.10915359 0.06600141 0.00609825 -0.0297483 0.01480897 -0.04339008 -0.05135455 0.07887796 -0.03644726 -0.06592085 -0.07369513 -0.06081114 -0.07535426 -0.00722965 -0.06795739 -0.00946846 -0.07845813 0.02838731 0.08122839 0.04107506 0.06546979 0.08405445 0.0508227 -0.03488699 0.0214938 0.04932626 -0.01639934 0.05786784 0.07197834 -0.06588392 -0.03445111 0.07736846 -0.05449369 0.06733942 0.05988861 -0.0325012 -0.06208948 0.04559102 0.09466845 0.03325675 0.01706473 0.0183485 -0.03004451 -0.04473086 -0.03866415 0.02532387 0.0610062 0.03790272 0.0881521 0.03128176 0.01592531 0.07005575 -0.07898337 -0.07271774 -0.0412937 -0.02838894 -0.03006732 0.03094221 0.04630121 -0.02674015 0.0237484 0.06020184 -0.01346322 -0.03857951 -0.07144529 -0.05249533 -0.07785283 0.09610857 -0.02561271 -0.03593169 0.08723996 0.07886607 -0.02040836 0.08062433 -0.02383257 -0.04896703 0.08364456 -0.06680343 -0.05311241 -0.00170316 -0.07322618 0.05427392 0.05896741 -0.03558496 -0.05655617 -0.04753905 -0.08190164 -0.06802389] [Finished in 0.5s]
The output is not right, are you sure you have been following the procedure?
import numpy as np from numpy import linalg as sla x = np.random.random(100) x = (x - x.mean()) x = x / sla.norm(x)
@LindberghLi I did the following : import numpy as np from numpy import linalg as sla x = np.random.random(100) x = (x - x.mean()) / sla.norm(x)
However there is no difference with your mentioned code, I already ran that code too , the answer is as follows : [ 0.01397131 -0.01160816 -0.05265727 0.03866572 -0.01645476 0.08325194 -0.11803957 -0.10673007 -0.05559012 0.02767274 -0.0661113 -0.14512182 -0.0067645 0.10720202 -0.09917398 -0.09731793 -0.06629394 0.07251171 0.16017651 -0.13594152 0.05045475 0.10502956 -0.14896338 -0.03871363 -0.11151628 0.01510344 0.09157133 -0.05050201 -0.09559871 0.09664996 -0.06860539 0.16028138 0.0126516 0.1012264 0.02489751 -0.03231539 0.16368335 0.00865726 -0.00067024 0.11036461 0.10803686 -0.06150751 0.09824492 0.01303202 -0.08824158 0.16938295 -0.11807805 -0.1155784 0.0384936 -0.14416773 -0.00056994 -0.13965506 -0.1364072 -0.02821911 0.14952093 0.06150571 -0.15367746 -0.10389123 -0.14494586 0.01496298 -0.02832725 -0.09547539 0.11287961 -0.06952172 -0.06352629 -0.08937357 0.15601137 0.16834175 0.14552653 0.01892126 0.00146 0.1340024 -0.09681039 0.11180743 -0.07329982 0.1014959 -0.04987375 -0.03336777 0.04294137 -0.12487522 0.00897264 0.17381436 -0.08279162 0.00262042 -0.1410245 -0.15258712 -0.09310202 0.07489392 -0.07194362 -0.04871598 0.16791106 0.05811589 0.17214072 0.14800187 -0.14532072 -0.09887269 0.09482788 0.15874922 0.11018884 0.05761107] [Finished in 0.5s]
Yes the current result (starting from "0.01397131" is correct, you shall use the code I mentioned.
ok , Thanks. So I'll close the ticket now .
@magsol
Based on today negotiation with Xiang , It seems that our reandvct function could be easily translated to a line instruction in python. Today I found that the reason why xiang used the RAND_MAX is only to normal the random number between(-1,1), and the RAND_MAX gives us the maximum possible random.However in python we have a random generator which gives us a number between 0 and 1 so we don't need the RAND_MAX . So, do you have any idea about how to change the "stat_randVCT" thanks.