Weights returned as nan

ghost commented 7 years ago

Hi, I have a simple Keras CNN working fine as it is. When trying to apply weightnorm, either with SGD or Adam, the first updated weights are all always return as NaN, triggering an error. This is an example of one layer weights just before the model.fit(): {'W_constraint': None, 'b_constraint': None, 'name': 'dense_3', 'activity_regularizer': None, 'trainable': True, 'init': 'glorot_uniform', 'bias': True, 'activation': 'softmax', 'input_dim': None, 'b_regularizer': None, 'W_regularizer': None, 'output_dim': 8} [array([[-1.81446958, -0.74279195, -1.98372281, 1.03149867, -1.33605921, 0.98080444, 1.46184123, -1.90489924], [ 1.74007297, 1.18310583, 0.96353596, -0.49502602, -1.5556761 , 1.71657765, 0.94695097, 2.61784649], [-0.638098 , -1.65658796, 0.45535672, 1.39707041, -0.53299773, -1.73198462, 0.05106336, -0.93136811], [-0.50413573, -0.12023554, -1.1118933 , -1.12377524, 1.9663564 , 1.5819149 , -1.72357309, -0.63662446], [-1.6616931 , 1.57845461, -1.33607149, 1.03262866, 1.02465236, -1.82984507, -1.94427574, 2.13097382], [-0.69643229, -1.69655061, 1.86963248, 1.35395622, 1.43264794, -1.60058153, 1.45158744, 1.88503206], [-0.1455002 , 0.44617018, -0.47829607, -1.31520915, 1.82627797, 1.81214976, -0.27336141, 1.91040981], [-0.78067726, 1.90638936, -1.97633493, -1.061988 , 0.02862636, -0.37745535, 1.65916157, 0.70244253], [-0.21252237, -0.65053529, 0.51744008, 0.68950123, -1.85650849, 1.0682615 , 1.55790281, -0.83147609], [ 0.48371872, -0.85853142, -2.022681 , -1.08805192, 2.06113982, -0.57459891, -1.63607311, -0.83574378], [ 1.05208552, -1.69211721, -0.43760285, 1.03213108, -2.36395407, -1.02809763, -0.806862 , -1.45331335], [-1.12855673, 1.70107543, 1.35683572, -1.20369387, -0.18256012, 2.01939988, 1.03289509, 2.65198541], [ 0.51740509, -0.23014481, 1.95300198, -0.66845942, 0.53607529, -1.01613665, 1.18222928, -0.80191672], [ 0.39752519, 2.14175916, 1.48441279, -1.20377731, -1.87403321, -0.11191524, -1.76513219, 2.63831162], [-1.98938465, -1.2327646 , -0.83744407, -0.64946407, 0.58288223, 2.24985504, -0.09591354, 2.01949072], [-1.42328095, 2.07457638, -1.33132982, -2.08888173, 1.02181983, 1.24852037, 1.10853899, -1.0029546 ], [ 1.75405586, 0.09432141, -1.31112003, -0.0304644 , -1.5135988 , -1.49612296, 1.2762996 , 0.60811853], [-1.64439476, 1.7335813 , -0.80541438, 0.27505419, 0.37458628, 0.72816306, 1.52508533, 1.85929 ], [-0.053883 , -2.13568377, 0.55463415, 0.43602318, 1.61183143, 1.48652506, -2.10601187, -1.08352566], [-1.21685481, 0.41039792, -0.78186649, 1.60308003, 0.99902558, 1.60311925, 1.10065258, 0.0354073 ], [ 2.12806535, 2.14419603, 0.96948087, 0.08199508, -0.84324813, -1.50271273, 0.10528874, -0.873142 ], [-2.15096569, 1.23474431, 1.25909293, -0.44441026, -2.08873248, 0.21763401, -2.12321043, -1.31675696], [ 1.95354533, 1.73437381, 1.38008749, 1.28455055, -0.34766021, -2.20302415, 0.51172131, -1.0840373 ], [ 1.58691943, 1.4111464 , -2.16242433, 1.90826643, -1.84906268, -1.18959498, -1.83963597, -0.12747419], [-0.4401913 , 1.22723794, -1.53341997, 1.43126631, -0.95519918, 0.61142218, 1.61414647, -0.13954096], [-0.63068312, 1.03541517, 2.19619155, -0.71226257, 1.70391488, 2.243999 , 1.81045079, -1.39369321], [ 0.22400506, 0.17860785, -1.42312717, 0.74690318, 0.66468042, -1.62544048, 1.75782633, 1.03065538], [ 2.11632895, 2.12409687, 1.10879564, 1.02491808, -0.37185353, 0.13456514, -1.70119786, -0.14151937], [-0.58504152, 2.31315374, 0.15611638, 1.2988714 , 1.33584034, 0.29542622, -1.18843138, 0.54929841], [ 0.84831744, -2.25127149, -0.42340177, -0.99950933, -0.33759385, 0.73217863, -1.75246251, -0.20512277], [ 1.16061187, -1.81038654, -1.50839853, 1.90214121, -0.33019581, -1.18630064, -0.29908586, -1.13772762], [-0.85308987, -0.56074762, -0.22539173, -0.95188016, -0.25569537, 1.48671508, -0.4336201 , 2.44569182]], dtype=float32), array([-0.75807816, -0.68674487, -0.79544491, -0.73615742, -0.74876821, -0.73147482, -0.74654377, -0.72675341], dtype=float32)] and these are the weights after 1 epoch: [[ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] ..., [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan] [ nan nan nan ..., nan nan nan]]

It´s the same for all layers. The data_based_init() works fine, by the way. Any clue what could be happening? I am using TF v12 with CUDA 8 and a GPU Geforce 1080

hefeiwangyande commented 6 years ago

Hello, I have the same trouble with you. When using the WN is only the decomposition of W into g, v , the classification task is running normally (accurate rate is not high). But when setting g and b according to Parameter_init in the paper the phenomenon of NAN appeared. The code is as follows：

def conv2d(x, num_filters, filter_size=[3,3], stride=[1,1], pad='SAME', nonlinearity=None, init_scale=1., ema=None, **kwargs):
    ''' convolutional layer '''
    with tf.variable_scope('conv2d'):
            # data based initialization of parameters
            V = tf.get_variable('V', filter_size+[int(x.get_shape()[-1]),num_filters], tf.float32, tf.random_normal_initializer(0, 0.05), trainable=True)
            V_norm = tf.nn.l2_normalize(V, [0,1,2])
            x_init = tf.nn.conv2d(x, V_norm, [1]+stride+[1], pad)
            m_init, v_init = tf.nn.moments(x_init, [0,1,2])
            scale_init = init_scale/tf.sqrt(v_init + 1e-8)
            g = get_var_maybe_avg('g', ema, shape=[num_filters], dtype=tf.float32,
                                                             initializer=tf.constant_initializer(1.), trainable=True)
            b = get_var_maybe_avg('b', ema, shape=[num_filters], dtype=tf.float32,
                                                            initializer=tf.constant_initializer(0.), trainable=True)
            g_u=tf.assign(g,g*scale_init)
            b_u=tf.assign_add(b,-m_init * scale_init)
            # with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
                # g = tf.get_variable('g',dtype=tf.float32, initializer=scale_init, trainable=True)
                # b = tf.get_variable('b', dtype=tf.float32, initializer=-m_init*scale_init, trainable=True)
            x_init = tf.reshape(g_u,[1,1,1,num_filters])*(x_init)+tf.reshape(b_u,[1,1,1,num_filters])
            x = tf.nn.l2_normalize(x_init, dim=[0, 1, 2])
            if nonlinearity is not None:
                x = nonlinearity(x)
            return x

`

harsh306 commented 6 years ago

https://github.com/harsh306/WeightNormalization

wkirgsn commented 6 years ago

same problem here with keras 2 (incorporating the pull request). No data based init applied - using weight norm for a single layer GRU model. LSTM is working fine - I guess it has something to do with the initialization of the weights.

openai / weightnorm

Weights returned as nan #4