lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k stars 395 forks source link

pre_norm_has_final_norm kwarg not used #202

Closed sashakunitsyn closed 11 months ago

sashakunitsyn commented 11 months ago

Not sure if this is a bug or a feature, but this kwarg https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L985 is set but never used. Maybe it should be used here as additional condition https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L1137?

lucidrains commented 11 months ago

@sashakunitsyn oh yes, i was initially using that when dealing with ResiDual paper, which had an exotic pre + post-norm combination

removed it for clarity! thank you!