Closed jake-g closed 6 years ago
Hi, so it's not even transferring a model but learning a new model only after changing the pre-computed melgram to on-the-fly-with-kapre ones? then I think the range could matter. Have you tried with putting BN after kapre melgram layer? In my experiences, the non-zero-mean inputs sometimes didn't work very well.
Well im not trying to reuse my weights from the original model, i'm retraining from scratch but still expect a model of similar performance to be learned since the input is very similar. Ill try the BN layer and report back, thanks
UPDATE: added BN after melgram layer, before the first conv with axis=3 (the channel axis) The same issue occurred, pretty much an identical training/validation curve as above. My original model, with and without precomputed melgram frequency normalization trains much better.
Fixed! It was indeed a me problem
Turns out the way i was creating the pickled raw audio numpy arrays was flawed and so when passed to the Melspectrograms layer, the result was pretty much all -inf or -80dB. Which obviously wouldn't train well.
When investigating and plotting the kapre melgram output in my original post, i loaded the wav file directly, so my flaw mentioned above was avoided.
Anyways, it looks like the kapre version is training very similar to my original model
Glad to hear that :)
I'm a bit stumped on this. TLDR; im getting weird behavior using kapre as a replacement for spectrogram feats
I have a model and traditionally i have precomputed 64 mel x 128 frame specs and fed them into the first layer. I tried integrating kapre because it seemed like a great idea (and i still think is). I added in the kapre mel spec layer tuned the same way I was generating my precomputed ones (librosa based) and nothing about them is trainable. My new input was pickled raw mono 16khz wav files (~5 seconds).
I started training and noticed there was very little learning taking place compared to the original model. I poked around tried adjusting my input shape, made sure it was
(None, 1, 79872)
were ~80k was the number of samples per wav.I also did a similar spectrogram comparison as in the examples/ and the kapre version looked nearly identical to my original. The values were scaled slightly differently, but more or less contained the same information. For example
[ -14.019988 -11.445856]
became[ -51.93689 -49.89946 ]
, see the attached specs for comparison:Original Version
Kapre Version
They basically look the same which is why im confused/surprised the kapre version doesn't train. I tried with and without normalization (frequency wise), transposing, scaling differently and I always have the same issue, they all seem to stop improving after a few epochs. My original model trained for > 50 before it stopped improving.
At this point im trying to figure out if this is a me problem or something going on with kapre so i figured it was worth a shot asking. Thanks for any help resolving this!
Lastly, here is a snippet of my model summary for the old and new kapre one
similarly, the kapre version looked like this, identical other than the first few layer
Finally here is a validation metric plotted where the gray line is the kapre one