tensorflow / neural-structured-learning

Training neural models with structured signals.
https://www.tensorflow.org/neural_structured_learning
Apache License 2.0
980 stars 189 forks source link

How to fit using validation_set? #52

Closed greatsharma closed 4 years ago

greatsharma commented 4 years ago

Hey I am new to NSL, I want to know how to fit the nsl model using validation set. In the traditional keras model I used to do this -

history = model.fit_generator(
    train_datagen.flow(X_train, y_train, batch_size=batch_size),
    validation_data=(X_valid, y_valid),
    steps_per_epoch=len(X_train) / batch_size,
    epochs=epochs,
    callbacks=callbacks,
    use_multiprocessing=True
)

When I passed validation_set to NSL model it throwed me error. How to do this with NSL?

I want this because my model is terribly overfitting, 99.916% accuracy on train and just 79.737% on validation test when tested individually after training is over. I think it totally learned the train data.

Also can you please give me some guidance how to tweak multiplier and adv_step_size to get better generalized model as my model is very bad as compared to base model. I am doing facial emotion recognition on three different classes(happy, sad & neutral) using the FER dataset available at kaggle.

greatsharma commented 4 years ago

Also how I replicate this entire thing to NSL, like train_datagen, callbacks etc.

greatsharma commented 4 years ago

@csferng @arjung please give some guidance. There are very less resources on NSL over the internet.

csferng commented 4 years ago

Hi Gaurav,

I assume you are using nsl.keras.AdversarialRegularization since you mentioned adv_step_size. And given that your model is for image recognition, I assume train_datagen is something like tf.keras.preprocessing.image.ImageDataGenerator.

Regarding arguments for fit_generator (or fit in gengeral), nsl.keras.AdversarialRegularization expects each example in training and validation data is a dictionary. To convert the (x, y) tuples generated by train_datagen to a dictionary, you may use an adapter like the following:

def convert_to_dict_generator(image_data_gen):
  for image, label in image_data_gen:
    yield {'image': image, 'label': label} 

model.fit_generator(
    convert_to_dict_generator(train_datagen.flow(...)),
    ...
)

or use a helper function in the standard library itertools:

convert_tuple_to_dict = lambda image, label: {'image': image, 'label': label}

model.fit_generator(
    itertools.starmap(convert_tuple_to_dict, train_datagen.flow(...)),
    ...
)

For validation data, you may as well use an ImageDataGenerator and the same adapter as above. Another approach is to convert Numpy arrays to a tf.data.Dataset (more details here):

validation_data = tf.data.Dataset.from_tensor_slices({'image': X_valid, 'label': y_valid}).batch(batch_size)

Regarding tweaking hyperparameters, adv_step_size depends on your input feature range. If your input range is [0, 1] (like pixel_value / 255), some value between [0.01, 0.1] might be a good start. weight decides how much attention the model should pay on adversarial examples (relative to on ordinary examples) and is dependent on your problem definition. If your main focus is the accuracy on ordinary (test) examples, then some value like 0.2 or 0.5 might work. Note that adversarial regularization might reduce overfitting a bit, but couldn't prevent overfitting entirely. You might also want to look at other aspects like the optimizer, learning rate, and overall model capacity.

greatsharma commented 4 years ago

Thankyou for your reply @csferng

I tried your code snippet, but got the following error :

ValueError: Please provide model inputs as a list or tuple of 2 or 3 elements: (input, target) or (input, target, sample_weights) Received {'image': <tf.Tensor: shape=(32, 48, 48, 1), dtype=float32,

My image is (48,48,1) passed to CNN and batch size is 32

here what my code looks like -

The base_model here is deep CNN

base_model = build_net(show_summary=False)
adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.01)
adv_model = nsl.keras.AdversarialRegularization(base_model, adv_config=adv_config)

train_datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.15,
    height_shift_range=0.15,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
)
train_datagen.fit(X_train)

adv_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

batch_size = 32
epochs = 20

adv_model.fit_generator(
    convert_to_dict_generator(train_datagen.flow(X_train, y_train, batch_size=batch_size)),
    validation_data=tf.data.Dataset.from_tensor_slices({'image': X_valid, 'label': y_valid}).batch(batch_size),
    steps_per_epoch=len(X_train) / batch_size,
    epochs=epochs,
)

Although I know debugging others code is not your your job but I request you to just once look at my notebook here. The code is same as above. It's very small and you can directly jump to modelling cells as inital cells contains only preprocessing. It will only require couple of minutes. Thankyou:)

csferng commented 4 years ago

The ValueError happens when the adv_model (which is a subclassed Keras model) is compiled with a generator-style input at its first fit/evaluate/predict call. The requirement of providing inputs as "a list or tuple of 2 or 3 elements" is lifted in the latest Tensorflow 2.2.0-rc3, which hopefully will become an official release soon.

For TensorFlow 2.1, a workaround is to call the adv_model with some dummy data before calling fit:

adv_model.evaluate({'image': X_valid[:10], 'label': y_valid[:10]})
adv_model.fit_generator(...)
greatsharma commented 4 years ago

thankyou very much @csferng it finally worked. Just few more things, how to call predict on adv_model I tried the traditional keras way adv_model.predict(X_test) and got this error,

Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), for inputs ['image', 'label'] but instead got the following list of 1 arrays:

then I tried this adv_model.predict({"image": X_test}) and got the following error, ValueError: No data provided for "label". Need data for each key in: ['image', 'label']

But while predicting why it is asking for "label" ? Because during testing the model we only have images not the labels.

Also should I do predictions on the base model itself? Like base_model.predict_classes(X_test) because it's working.

Also how to use the max_nbrs parameter?

csferng commented 4 years ago

For predicting without label information, please use the base_model instead.

Regarding max_nbrs, it is used for specifying the number of neighbors to be considered in graph regularization. Graph regularization utilizes existing structured signals (represented by a graph of neighbors), while adversarial regularization generates the neighbors based on adversarial perturbations. Please see our tutorial on graph regularization here.

greatsharma commented 4 years ago

ok thankyou :)