Open Rabia-Metis opened 6 years ago
Have you tried a simpler model and seen if it is able to learn on your dataset ? Are the train and validation images from the same datasets ?
Yes, I have tried Resnet-152, SEnet and achieved average accuracy above 80% Yes, both are from the same dataset
Interesting. What type if preprocessing are you using ? Mean std or -1 to 1? Also, are you using regularization default value? Try setting it to 0
I tried with (subtracting mean and dividing by SD) and without pre-processing both. I also tried setting weight_decay=0 as you said but it's not making any difference:
model = NASNet(classes=2,input_shape=(224, 224, 3), weights=None, penultimate_filters=4032, nb_blocks=6,use_auxiliary_branch=True, skip_reduction=False, weight_decay=0)
You can view the logs here https://www.floydhub.com/ptanikon2/projects/n-net/3
Why not try fine tuning one of the pre trained NASNet blocks rather than train from scratch? Since you seem to have significant computation available, I suggest using NASNet Mobile as a base and add on the layers to make your final classifier.
Also, use the -1 to 1 preprocessing for NASNets, and especially when using pretrained weights for fine-tuning.
Okay but weights are not available for NASNET_LARGE_WEIGHT_PATH_WITH_auxiliary_NO_TOP = "https://github.com/titu1994/Keras-NASNet/releases/download/v1.1/NASNet-auxiliary-large-no-top.h5" And when I don't use auxiliary branch and use NASNet-large-no-top.h5 it gives me the following error:
Use NASNet Mobile, not large. Weights for all models are available. Refer to Keras blogpost on Fine-tuning to see how to use no-top models for training.
I have tried NASNet mobile but its results are also not good (attached). Kindly suggest what else I can try? Secondly there is a typo in auxiliary weights path due to which weights were not loading earlier. Change 'auxiliary' to 'auxilary' in weights path.
Did you find solution for you problem? I've encountered a similar problem and here is my post on stackoverflow
@maystroh No, not yet. Let me know too if you find any solution
Have you tried training it with the auxiliary loss for both the mobile and large models ? It is a strong regularizer which is required for training NASNet-A Model from scratch (not as important when just fine-tuning the Dense layers that you add).
Yes @titu1994 , I have tried auxiliary branch with both but it's also not making any difference
I don't really have an answer then. It's either a problem with the regularizer strength, or with the auxiliary branch not working, or perhaps the model is too large so it's overfitting or something.
The fact that the mobile version doesnt do better either points to some other problem than the above though. Can you give more information of what the dataset is about, number of samples, size of images, task that you are performing (classification or regression or boundary box regression), and more information.
@titu1994 I experienced the same issue. You can see the training history in the middle of my Jupyter notebook.
@Agent007 A 87 Million parameter model for a dataset of 6600 images. I would say that is cause for concern for any large model, not just NASNet. Use the NASNet Mobile version and see if that reduces the error rate.
With your validation and test set scores being so low compared to training, I think there is something wrong with the data itself.
In cases from this thread, the discrepancy isn't that vast. The reduced performance is a concern, but your case is different.
My network learned from data. I used CIFAR 768 model but
I was reading through the paper again. They used a lots of tricks to get such high performance. Just skimming through a few :
I think the two major regularizers are the DropPath and the auxiliary branch. Auxiliary branches are significant regularizers and you don't usually place such a weird branch to a model unless it somewhat significantly impacts the learning process.
Also, Keras in general does not seem to be able to exactly match the performance of PyTorch or base Tensorflow models (even when using the same backend as Tensorflow) for some reason. Obvious reasons are random initialization is different and that's fine, but to have 2-4% difference is a little weird. I try and match the papers almost word for word in Keras as to the decay and initializers and bias and everything, but it always seems to be 2-4% less than what the paper claimed.
Till date, I have never been able to get a basic ResNet 50 to the same level of performance as claimed in the original paper in Keras. TF manages it close enough though (with a 0.035% absolute difference, which is probably due to random initialization).
That is weird. Not sure why that would be. In my example I used Cutout and some other augmentation techniques and was that far off. If their mode is that dependent on an annealed schedule and the auxillary branch then I wouldnt say there is anything special about the network it self.
Also what is the topology of an auxillary branch? I read an updated paper that used NasNet and all it talkbed about was Normal and Reduction cells. It would be cool to see a plot_model or something similar to get a feel for what it is actually doing.
Thanks for insights by the way.
Having a hard time interpreting this cloud with the dots in the middle of it and I cant find what it means in the paper. As a result I dont understand what the difference between h_sub_i and h_sub_i_minus1 is. Is the cloud performing some type of operation?
@pGit1 It's meant to show how multiple layers of Normal Cells should be connected when stacked on top of each other. In this case, there are skip connections from the input of the previous Normal Cell to the convolutions within the current Normal Cell.
@Agent007 I am not sure I am following. So h_sub_i is the concatenated output of h_sub_i-1 and the dotted lines represents the concatenated output of h_sub_i-2??
H_i is the output of your current cell. H_i-1 is the input to your current cell. H_i-2 is the input to the cell prior to your current cell.
@titu1994 what layer would you freeze from for finetuning imagenet weights with a (200000,224,224,3) dataset in 128 classes?
@alkari I'd suggest holding off on using the Keras version of NASNet for fine-tuning purposes for the moment. There have been mumrioke independent reports that suggest the weight loading mechanism was not perfect.
You can try fine-tuning the tensorflow models repository directly, for best results.
Thanks @titu1994, wouldn't that be resolved by freezing deep enough into the network tho? Which is why I was wondering if there's an optimal layer to freeze from. Here's where I have it:
model.trainable = True
set_trainable = False
for layer in model.layers:
if layer.name == 'activation_253':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False
print("layer {} is {}".format(layer.name, '+++trainable' if layer.trainable else '---frozen'))
Current results so far in training without augmentation:
Epoch 48/378 100/100 [==============================] - 538s 5s/step - loss: 2.9667 - predictions_loss: 0.7452 - aux_predictions_loss: 0.1686 - predictions_acc: 0.7764 - aux_predictions_acc: 0.9573 - val_loss: 4.2562 - val_predictions_loss: 1.5506 - val_aux_predictions_loss: 1.3832 - val_predictions_acc: 0.5697 - val_aux_predictions_acc: 0.6290
Usually all layers will the last convolutions layer are frozen and then you would add a new classifier to the end. I dunno how, but it's learning something. I'm guessing it's cause even though the weights aren't working, the trainable portion of the network is properly learning something useful.
Try setting a small baseline with a small batchnorm vgg network first
is weight loading completely fixed after the latest commit? @titu1994
Yes. Though I haven't tested the auxiliary branches yet, but the weights have been ported. If they still don't work, it's gonna be a problem.
It seems like overfitting.Try an easy model.
@pGit1 It's meant to show how multiple layers of Normal Cells should be connected when stacked on top of each other. In this case, there are skip connections from the input of the previous Normal Cell to the convolutions within the current Normal Cell.
@Agent007 Wait years later your answer is CRYSTAL clear. Not sure why I was over complicating things.
Thanks for the amazing work I am having issue regarding network learning. My NASNet model ain't learning. Training accuracy is improving but validation accuracy isn't changing and stuck at 0.4194. Training data=600 imgs, testing data=62 imgs Image shape= (224,224,3) Epochs=10-15