Open dibya-pati opened 1 year ago
hi @dibya-pati, i encourage you to read through the other questions, you'll find answers there
This figure compares different loss functions, some functions using point wise loss, or pair wise loss (each positive has a corresponding negative), and it seems Sampled Softmax is performing better across tasks compared to these other losses
Hi, I'm trying to understand the loss computation for the movielens retrieval example. In case of movielens dataset there are ~900 users and ~1600 movies, and when we train the two tower model considering user(U_A)-item(I_A) pairs, we consider only the current U_A-I_A pair as positive (using tf.eye () for labels) and penalizing every other UA-I{!A} combinations in the batch. My question is: