Open HansWurst90 opened 3 years ago
Are you asking about the retrieval loss, or the retrieval metric?
Assuming you are asking about the loss, the general idea of retrieval models is that user clicks (etc) are the ground truth. If a user clicked on an item, the model should predict a high score for that item. Anything the user did not click on should receive a lower score.
My Question is about how the factorized_top_k accuracy metric is calculated internally. I'm having a hard time understanding it from looking at the source code. Can you explain in in easier terms than the comments in the source code?
Hello everyone,
I looked at all the quickstart tutorials and used the basic_retrieval example to adjust it to my dataset.
views_df
contains pairs ofuser_ids
andcontent_ids
and represent when a user viewed a content.Dataset and Result
The dataset is fairly small (1026 views from 63 users on 187 contents) but the code seems to work and my results are as follows:
Train:
factorized_top_k/top_1_categorical_accuracy: 0.0012 factorized_top_k/top_5_categorical_accuracy: 0.0816 factorized_top_k/top_10_categorical_accuracy: 0.2046 factorized_top_k/top_50_categorical_accuracy: 0.7430 factorized_top_k/top_100_categorical_accuracy: 0.8965 loss: 494.7287
Test:
factorized_top_k/top_1_categorical_accuracy: 0.0 factorized_top_k/top_5_categorical_accuracy: 0.0243 factorized_top_k/top_10_categorical_accuracy: 0.0585 factorized_top_k/top_50_categorical_accuracy: 0.3804 factorized_top_k/top_100_categorical_accuracy: 0.6146 loss: 31.29269790649414,
Question
I am unsure if I created the query embeddings and candidate embeddings correctly from my dataset for the calculation of the metrics FactorizedTopK metric. I am also having trouble to unterstand the computation of theFactorizedTopK metric in general. I looked at the source code but don't understand the explanation of how it is calculated.
Where does it take the ground truth from? Aren't the query and candidate embeddings just lists of all users and contents? Is the order of the list of importance? Can someone explan the computation of the FactorizedTopK metric in simpler terms?
Thanks in advance
Code