Removed biases breaks pre-trained models

lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

MIT License

4.63k stars 395 forks source link

Removed biases breaks pre-trained models #225

Closed zqevans closed 8 months ago

zqevans commented 9 months ago

The recent change to remove biases across the model means I can't load my previously-trained x-transformers models.

It would be nice if the previous behavior were the default to remain backwards-compatible, with an option to remove all of the biases

lucidrains commented 9 months ago

@zqevans ah, i bumped a minor version though

can't you just pin it?

lucidrains commented 9 months ago

it had to happen at some point. layernorm biases were shown to make transformer training unstable

zqevans commented 9 months ago

I can pin it, it just means the stable-audio-tools library can't start using any features past that point since it's pinned.

Could you at least make them opt-in? That way the default is still the ideal setup, and newer models get that automatically, but old ones can still be used with the right settings turned on.

lucidrains commented 8 months ago

@zqevans yea, i think that would overly complicate the repo for little gain

why not just anneal out the biases over an extra epoch of training?

lucidrains commented 8 months ago

@zqevans btw, how are the new audio results looking?