pre_norm_has_final_norm kwarg not used

lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

MIT License

4.63k stars 395 forks source link

Closed sashakunitsyn closed 11 months ago

sashakunitsyn commented 11 months ago

lucidrains commented 11 months ago

@sashakunitsyn oh yes, i was initially using that when dealing with ResiDual paper, which had an exotic pre + post-norm combination

removed it for clarity! thank you!