Open tarohi24 opened 4 years ago
It may not be so bad. I replaced tfidf value to tf value for debugging. It goes well.
So there're multiple issues.
In whatever way I choose keywords (dimensions), I can calculate fBoW.
For debugging, I made a trial where # keywords = 2
Different from what I expected, the fraction of contribution to each dimension is as follows:
0.3 0.7
0.4 0.6
..
For simplicity, I didn't note numbers in detail. There're some important observations:
How about adding activation function so that the loss function will truly choose the optimal 'topics' that coverts the whole document.
i.e. Remove words that have been covered (it means similarity of the word and a keyword is above 0.8)
I guess this is because IDF is considered in loss but not considered in rerank.
IDF isn't necessary for calculating IDF (because each keyword represent a 'concept'(
I expected values of fBoW of a query to be almost sorted in descending way, but it doesn't.