Closed OverLordGoldDragon closed 5 years ago
BatchNormalization outputs with a more dramatic val vs. train difference:
One layer zoomed:
Probably resolved: I've inadvertently left in one non-standardized sample (in a batch of 32), with sigma=52
- which severely disrupted the BN
layers; post-standardizing, I no longer observe a strong discrepancy between train & inference modes - if anything, any differences are difficult to spot.
I implemented this paper's neural net, with some differences (img below), for EEG classification;
train_on_batch
performance is excellent, with very low loss - buttest_on_batch
performance, though on same data, is poor: the net seems to always predict '1', most of the time:Data is fed as 30-sec segments (12000 timesteps) (10 mins per dataset) from 32 (= batch_size) datasets at once (img below)
Any remedy?
Troubleshooting attempted:
Additional details:
BatchNormalization
after every CNN & LSTM layerreset_states()
applied between different datasetsUPDATE: Progress was made;
batch_normalization
anddropout
are the major culprits. Major changes:sample_weights
to reflect class imbalance - varied between 0.75 and 2.Considerable improvements were observed - but not nearly total. Train vs. validation loss behavior is truly bizarre - flipping class predictions, and bombing the exact same datasets it had just trained on:
Also,
BatchNormalization
outputs during train vs. test time differ considerably (img below)UPDATE 2: All other suspicions were ruled out: BatchNormalization is the culprit. Using Self-Normalizing Networks (SNNs) with SELU & AlphaDropout in place of BatchNormalization yields stable and consistent results.