Closed MOJTABAFA closed 8 years ago
Thanks Mojtaba, I'll check it in a little while.
iPhone'd
On Nov 21, 2015, at 20:12, MOJTABAFA notifications@github.com wrote:
Assigned #9 to @magsol.
— Reply to this email directly or view it on GitHub.
Normalizing a vector by its l2 norm is the same thing as making the vector unit length; you're dividing each element of the vector v[i]
by the magnitude of the vector ||v||
. To do this, use SciPy's linear algebra library:
import numpy as np
import scipy.linalg as sla
a = np.array([1, 2, 3, 4, 5], dtype = np.float)
print sla.norm(a) # "2.397587827269453"
b = a / sla.norm(a)
print sla.norm(b) # "1.0"
So the vector b
is the normalized, unit-length version of the vector a
.
@LindberghLi Xiang: would you please check above Dr. Quinn's comment on normalization function ? Please check it out and let me know if this kind of normalization can satisfy your problem . then I can work on it and will try to optimize the normalization functions and add them to main program.
Just FYI: Kind of like other social media, you can tag people in notes with the '@' symbol, @MOJTABAFA.
Ok , thanks. I will update the comment now
@magsol I already copied your code in my sublime but there is some error, I'm not sure but I think I didnt install the scipy . am I right ? Error:
Traceback (most recent call last):
File "C:\Users\Mojtaba Fazli\Desktop\normalization.py", line 2, in
... but "conda list" command shows that there is a scipy library installed in my system
@MOJTABAFA
The code by Shannon is good for the l-2 norm normalization, but you'll need a similar route for the zero-mean normalization as well.
@magsol now it works : I should change the scipy import as :
from numpy import linalg as sla
@magsol But I dont know why its answers are different from those you mentioned ?
import numpy as np from numpy import linalg as sla
a = np.array([1, 2, 3, 4, 5], dtype = np.float) print (sla.norm(a)) # "2.397587827269453" c = sla.norm(a) b = a / c print (b) # "1.0"
=========={ out put }========== 7.4161984871 [ 0.13483997 0.26967994 0.40451992 0.53935989 0.67419986] [Finished in 0.2s]
@LindberghLi what about this for mean-zero normalization ? import numpy as np
y = np.random.randn(10, 10) print(y) normed = (y - y.mean(axis=0)) / y.std(axis=0) print('normed mean =',normed.mean(axis=0)) print('normed std =',normed.std(axis=0))
====================== out put ============= [[ 0.67912547 -0.51589505 -0.4424499 -0.16515243 -1.34762102 0.22626589 0.34721551 1.45637866 -0.24009679 0.21131739] [ 0.5058812 -0.77646398 0.47343891 -0.3821469 1.60240853 -0.54143379 0.15397853 -0.34675699 -0.47872134 -0.20981917] [ 1.55658057 0.14938842 0.1395679 0.03975734 0.10721487 -0.16563412 1.21940819 -0.47438863 -1.13381981 -0.20517275] [ 0.85730614 0.01776607 1.22002908 0.9858714 0.43821209 -0.23075819 1.20476702 -2.01791451 1.39054771 -1.49560731] [-1.49584899 -1.70729191 -0.36759594 0.44967996 -1.16665163 -0.47875628 -0.77648296 0.32686771 0.48212816 1.61136346] [ 0.96703504 0.35095139 0.38318928 0.94518336 -1.72319926 -0.15169197 -1.9715908 -0.62311711 -0.52933993 -0.05238334] [ 0.24992697 -1.4416581 -0.56934585 1.81037335 0.67048827 2.04979197 0.8786347 -1.27356192 -0.30720224 -1.54699837] [-0.54240094 -0.19582847 1.39024218 -1.7890984 0.39088153 -0.05736905 0.64651929 0.53540127 1.02180067 0.50595341] [ 0.00660171 0.56305246 1.84318845 0.30656014 -0.58597558 -0.83547812 -0.42220257 -0.60727885 -0.39588576 0.03611984] [-0.6444746 -0.31379341 -0.57833217 1.27285969 -0.68151022 1.52165882 -0.21176998 0.17241932 -1.27050344 0.90365203]] normed mean= [ 8.88178420e-17 -5.68989300e-17 8.88178420e-17 5.55111512e-17 -1.66533454e-17 -2.22044605e-17 -2.77555756e-17 -7.21644966e-17 0.00000000e+00 -2.22044605e-17] normed std= [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [Finished in 0.4s]
It seems good
@magsol Actally I get confused , because when I'm running the xiang algorithm for normalization my answer would be as follows : import numpy as np
mtx_input= np.arange(100).reshape(10,10) print('original mat= \n',mtx_input) for p in range(10): double_mean = 0 for t in range (10): double_mean = mtx_input[[t], [p]] + double_mean double_mean = double_mean/10 for t in range (10): mtx_input[[t], [p]] = mtx_input[[t], [p]] - double_mean print(mtx_input)
================================out put ================ original mat= [[ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24 25 26 27 28 29] [30 31 32 33 34 35 36 37 38 39] [40 41 42 43 44 45 46 47 48 49] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79] [80 81 82 83 84 85 86 87 88 89] [90 91 92 93 94 95 96 97 98 99]] normalized mat= [[-45 -45 -45 -45 -45 -45 -45 -45 -45 -45] [-35 -35 -35 -35 -35 -35 -35 -35 -35 -35] [-25 -25 -25 -25 -25 -25 -25 -25 -25 -25] [-15 -15 -15 -15 -15 -15 -15 -15 -15 -15] [ -5 -5 -5 -5 -5 -5 -5 -5 -5 -5] [ 5 5 5 5 5 5 5 5 5 5] [ 15 15 15 15 15 15 15 15 15 15] [ 25 25 25 25 25 25 25 25 25 25] [ 35 35 35 35 35 35 35 35 35 35] [ 45 45 45 45 45 45 45 45 45 45]] [Finished in 0.4s]
but when I try to do the zero-mean normalization with numpy the results are different:
import numpy as np
y = np.arange(100).reshape(10,10) print('original mat= \n',y) normed = (y - y.mean(axis=0)) / y.std(axis=0) print('normed mean=',normed.mean(axis=0))
==================={ out put}====================
original mat= [[ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24 25 26 27 28 29] [30 31 32 33 34 35 36 37 38 39] [40 41 42 43 44 45 46 47 48 49] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79] [80 81 82 83 84 85 86 87 88 89] [90 91 92 93 94 95 96 97 98 99]] normed mean= [ -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16 -1.11022302e-16] [Finished in 0.4s]
@LindberghLi
xiang do you have any idea about above comment?
It looks like numpy.linalg.norm
and scipy.linalg.norm
have identical operations, which is good. However, we need to figure out what the bug is on your end in importing scipy
because that is a critical library for other operations not present in numpy
.
I haven't looked closely but I think your "version" of Xiang's algorithm has a bug in the array indexing that's resulting in different output. The numpy version you posted looks good.
Just FYI: in Python, the convention for vectors is lowercase variable names (like y
), but for matrices it's uppercase (like Y
). Also, when testing for zero-mean, unit-variance, you can do this:
A = np.random.random((10, 10))
print A.mean(axis = 0)
# [ 0.59888529, 0.40256814, 0.52723793, 0.5827174 , 0.35847958,
# 0.47607431, 0.58255637, 0.51890551, 0.56916436, 0.44384175]
print A.std(axis = 0)
# [ 0.23854139 0.27000021 0.28236851 0.30656586 0.2882628 0.33507456
# 0.24044369 0.20864492 0.24725187 0.34637215]
B = (A - A.mean(axis = 0)) / A.std(axis = 0)
print B.mean(axis = 0)
# [ 3.55271368e-16 -1.11022302e-16 -1.33226763e-16 -4.44089210e-17
# 6.10622664e-17 -1.11022302e-16 2.44249065e-16 5.27355937e-17
# -2.24820162e-16 9.99200722e-17]
print B.std(axis = 0)
# [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
@magsol
Thank you very much, actually already I discussed with Xiang and the problem is solved , However, the problem of scipy is still remained and I need to meet you after holidays and check with you in my laptop. I'll push the changed codes on the github , please check it and close the ticket if every thing seems good.
Dear Dr. quinn: For normalization functions , It seems that the functions are mostly heuristic and designed based on experiences to be fit with this problem. Thus it's not possible to find an exact function equivalent with this functions in numpy or scipy . Therefore I Think I should convert line by line the normalization functions of xiang's code . for example I wrote the following one for "stat_normalize2l2NormVCT" as :
import numpy as np
vct_input = np.array([0,1,2,5,0],dtype=float) T=5 double_l2norm = 0 for t in range(T): double_l2norm = vct_input[t]*vct_input[t] + double_l2norm print(vct_input[t]) double_l2norm = np.sqrt(double_l2norm)
for t in range (T): vct_input[t] = vct_input[t]/double_l2norm print(vct_input) ===================={ out put}========== 0.0 1.0 2.0 5.0 0.0 [ 0. 0.18257419 0.36514837 0.91287093 0. ] [Finished in 0.3s]