lucidrains / x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.42k stars 377 forks source link

Update x_transformers.py #231

Closed notprime closed 6 months ago

notprime commented 6 months ago

ViTransformerWrapper was lacking self.post_emb_norm layer in line 1460, throwing an error when forwarding an input

lucidrains commented 6 months ago

oops, ty

notprime commented 6 months ago

@lucidrains no worries Phil, it's been a pleasure! btw, I was thinking: sometimes it makes sense to compute a loss not only on logits/predictions as in CrossEntropy but also on embeddings (e.g. ArcFace and similar losses). Looking at the code, the embeddings are returned only if the mlp head does not exist or if return_embeddings is set to True, but if someone wants to obtain both the logits and the embeddings you have to do forward pass two times (one with return_embeddings set to True, one to False). Would it make sense to you to modify the class a bit, just to return both if needed? I've seen lots of papers using a combination of CrossEntropy and ArcFace for example. I can easily manage it if it's not a problem

lucidrains commented 6 months ago

@notprime what is ArcFace?

lucidrains commented 6 months ago

yea in the middle of something, but i can add that quickly (famous last words)

lucidrains commented 6 months ago

@notprime oh i have it already, as return_logits_and_embeddings = True is this what you are looking for?

notprime commented 6 months ago

@lucidrains,

ArcFace paper: https://arxiv.org/pdf/1801.07698.pdf it is basically a loss function mainly used in biometrics settings (it was used for the first time for face recognition but it can basically used in every classification setting). Instead of only separating classes through softxmax, you can also project embeddings such that you minimize the intra-class distance and maximize the inter-class distance.

Yes, return_logits_and_embeddings is what I was thinking about, but I've seen it only in the TransformerWrapper class

lucidrains commented 6 months ago

@notprime ohh i see! yea let me throw it in there

lucidrains commented 6 months ago

@notprime ok done

notprime commented 6 months ago

@lucidrains saw it, thanks!

I'll be heavily working with your vit repo these days, I'll open issues if I find something similar to be implemented there also :D