Closed rossbalch closed 1 week ago
Hey. Just going to fill in some of the conversation here from elsewhere.
It's not really that there's a "bug"--just that training LSTMs is...way worse than the "batteries-included" "WaveNet" architecture I also provided in this repo, and there's nothing that demands that the universe make both of them "equal"--recurrent models are just a pain to train compared to other architectures and that's one of the main reasons that basically no one uses them anymore.
What you mentioned about adding more parameters not helping also tracks--again, there's nothing that guarantees that you'll get better performance if you spend more compute on it. Sometimes that's the case--and the fact that it happens more reliably is one of the reasons why people do spend more and more compute on more modern architectures like transformers for "AI" applications--because it pays off unlike other approaches to the problem).
It's not to say that there might be some better parameters for the LSTM than what I've scribbled in--I just haven't found them.
But in short, for every week I waste trying to make a recurrent architecture better, I usually can match the gains w/ a convnet architecture in an hour 😉.
I'm going to close this, but if you do find better hyperparameters and want to share them in a PR, then that would be welcome 🙂
Describe the bug Training of LSTM models currently seems to currently have issues. Using the default parameters in the GUI to train the same re-amp as a model trained in the default Wavenet parameters yielded wildly different results. I also try the same in the CLI version with a different re-amp and while some seemed to train correctly, there was still some weird behavior.
To Reproduce I have run replicates and gotten the same results multiple times. I used the standard NAM v3 input file, as well as some custom inputs (such as Aida-X input file, as they are using LSTM for their RT-Neural implementation). I tried an ny of both 8192 and 32768 as suggested in the config files. I further tried with larger LSTM models like 32 hidden units, and 2 layers. The increase in size did not help. In fact it made the ESR worse as seen in the picture above.
Desktop (please complete the following information): Win 1124H2 - Anaconda Powershell Both GUI and CLI NAM Trainer v 0.10.0