Closed notprime closed 6 months ago
oops, ty
@lucidrains no worries Phil, it's been a pleasure! btw, I was thinking: sometimes it makes sense to compute a loss not only on logits/predictions as in CrossEntropy but also on embeddings (e.g. ArcFace and similar losses). Looking at the code, the embeddings are returned only if the mlp head does not exist or if return_embeddings is set to True, but if someone wants to obtain both the logits and the embeddings you have to do forward pass two times (one with return_embeddings set to True, one to False). Would it make sense to you to modify the class a bit, just to return both if needed? I've seen lots of papers using a combination of CrossEntropy and ArcFace for example. I can easily manage it if it's not a problem
@notprime what is ArcFace?
yea in the middle of something, but i can add that quickly (famous last words)
@notprime oh i have it already, as return_logits_and_embeddings = True
is this what you are looking for?
@lucidrains,
ArcFace paper: https://arxiv.org/pdf/1801.07698.pdf it is basically a loss function mainly used in biometrics settings (it was used for the first time for face recognition but it can basically used in every classification setting). Instead of only separating classes through softxmax, you can also project embeddings such that you minimize the intra-class distance and maximize the inter-class distance.
Yes, return_logits_and_embeddings
is what I was thinking about, but I've seen it only in the TransformerWrapper
class
@notprime ohh i see! yea let me throw it in there
@notprime ok done
@lucidrains saw it, thanks!
I'll be heavily working with your vit repo these days, I'll open issues if I find something similar to be implemented there also :D
ViTransformerWrapper was lacking self.post_emb_norm layer in line 1460, throwing an error when forwarding an input