talhanai / redbud-tree-depression

scripts to model depression in speech and text
70 stars 30 forks source link

Question about validation and test data #9

Closed clintonlau closed 3 years ago

clintonlau commented 3 years ago

Hi @talhanai,

I hope you can help me out with a question about your trainLSTM.py code.

# train model
model.fit(X_train, Y_train,
            batch_size=batch_size,
            epochs=epochs,
            validation_data=(X_dev, Y_dev),
            class_weight=cweight,
            callbacks=callbacks_list)

# load best model and evaluate
model.load_weights(filepath=filepath_best)

# gotta compile it
model.compile(loss=loss,
            optimizer=sgd,
            metrics=['accuracy'])

# return predictions of best model
pred        = model.predict(X_dev,   batch_size=None, verbose=0, steps=None)
pred_train  = model.predict(X_train, batch_size=None, verbose=0, steps=None)

return pred, pred_train
# 5. evaluate performance
f1 = metrics.f1_score(Y_dev, np.round(pred), pos_label=1)

Particularly, I am having trouble understanding why you are using X_dev and Y_dev as both validation data and test data. By using them for both validating and testing would result in data leakage.

From reading your paper, I understand that you were only working with the training set and development set of the DAIC dataset. So here, I am assuming that X_train, Y_train are from the training set and X_dev and Y_dev are from the development set.

Any insights would be very much appreciated!

talhanai commented 3 years ago

Hi @clintonlau

If I recall, that was probably to evaluate the model and print performance during training. From the keras documentation ([https://keras.io/api/models/model_training_apis/#fit-method]()):

validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data.

The test set for this particular dataset does not have the ground truth labels (at least when I worked on it) so the dev was effectively the test when reporting results.

clintonlau commented 3 years ago

Oh yes, my bad I didn't mean data leakage. I guess my initial assumption was that you were using the dev set as an unseen held-out set and split the training set into train/val splits, but you used the dev set for both hyperparameter tuning and final model evaluation to report results as the test set was not available.

Thanks for the clarification.

talhanai commented 3 years ago

How was the dev set used for hyperparameter tuning?

clintonlau commented 3 years ago

Please correct me if I am wrong, I see the dev set as being used for hyperparameter tuning as well since it was passed to the fit() method and it was used to evaluate the model after each epoch. The model was not trained on the dev set directly but you could monitor the validation results (loss, acc, etc.) during training, which indirectly influences your design decisions for the hyperparameters.