Some higher gain reamps with a high amount if high-end have proved trickier to nail.
While I can always up the channels, dilations or kernel in one (or both) of the layers, I have been trying to find out if there are ways to avoid slow (or no) progress in the ESR after about the 700 epoch threshold using the STANDARD architecture in the trainer.
Having spent a bit of time poking ChatGPT's brains and experimenting with various ways (RAdam, AdamW etc.), making use of warm restarts seems to help a little bit.
Over the course of a 700 epochs, it seems to help take the ESR down from 0.009798 (default scheduling) to 0.008789 (with CosineAnnealingWarmRestarts).
That said, there is a caveat and fine-tuning that needs to go with it - I'll post back if I find a good combination. Right now, first attempts look promising but after the restarts the ESR still takes a while until it manages to go under its previous plateau:
Some higher gain reamps with a high amount if high-end have proved trickier to nail.
While I can always up the channels, dilations or kernel in one (or both) of the layers, I have been trying to find out if there are ways to avoid slow (or no) progress in the ESR after about the 700 epoch threshold using the STANDARD architecture in the trainer.
Having spent a bit of time poking ChatGPT's brains and experimenting with various ways (RAdam, AdamW etc.), making use of warm restarts seems to help a little bit. Over the course of a 700 epochs, it seems to help take the ESR down from 0.009798 (default scheduling) to 0.008789 (with CosineAnnealingWarmRestarts).
That said, there is a caveat and fine-tuning that needs to go with it - I'll post back if I find a good combination. Right now, first attempts look promising but after the restarts the ESR still takes a while until it manages to go under its previous plateau: