jordanott / DeepLearning

Course material from when I taught deep learning at Chapman University
7 stars 5 forks source link

2.4 Hyperparameter Optimization Question #1

Open ramir266 opened 4 years ago

ramir266 commented 4 years ago

Hey Jordan, looking at problem 2.4, how do you want us to implement the neural network? Do you want us to use:

Method #1 model = Sequential() model.add(Dense(256, activation='relu',input_shape=(784,))) model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy']) model.fit(x_train, y_train,batch_size=32,epochs=10,verbose=1,validation_data=(x_test, y_test))

Method #2 alpha = 0.01 # set learning rate theta_1 = np.random.normal(0,.1,size=(2,3)); b1 = np.zeros((1,3)) # init weights theta_2 = np.random.normal(0,.1,size=(3,2)); b2 = np.zeros((1,2))

J = []
for i in range(10000): l1 = relu(np.dot(X, theta_1) + b1) # l1 = X theta_1 y_hat = softmax(np.dot(l1, theta_2) + b2) # Y_hat = l1 theta_2 + b

cost = np.sum( - (Y * np.log(y_hat) + (1 - Y) * np.log(1 - y_hat)) )
J.append(cost)                                         # store cost

dJ_dZ2 = d_softmax(y_hat,Y)                            
dJ_dtheta2 = np.dot(l1.T, dJ_dZ2)                      # compute gradients
dJ_db2 = np.sum(dJ_dZ2, axis=0, keepdims=True)

dJ_dZ1 = np.dot(dJ_dZ2, theta_2.T) * d_relu(l1)
dJ_db1 = np.sum(dJ_dZ1, axis=0, keepdims=True)

theta_2 -= alpha * dJ_dtheta2                         # weight update
b2 -= alpha * dJ_db2
theta_1 -= alpha * np.dot(X.T, dJ_dZ1)
b1 -= alpha * dJ_db1

if J[-1] == 0 or J[-1] > 10: break

The issue with method #1 is that you can't implement the learning rate portion that you wanted us to use, so I am assuming it's method #2 but I wanted to clarify with you. Please let me know.

jordanott commented 4 years ago

Use method 1 (Keras). We haven't talked about backprop in detail. You'll do method 2 on the next homework. You can set the learning rate in the optimizer.