tarohi24 / bowi_archive

BoW Improved
MIT License
1 stars 0 forks source link

fBoW not working #10

Open tarohi24 opened 4 years ago

tarohi24 commented 4 years ago

I expected values of fBoW of a query to be almost sorted in descending way, but it doesn't.

tarohi24 commented 4 years ago

It may not be so bad. I replaced tfidf value to tf value for debugging. It goes well.

tarohi24 commented 4 years ago

So there're multiple issues.

How to select keyword?

In whatever way I choose keywords (dimensions), I can calculate fBoW.

tarohi24 commented 4 years ago

For debugging, I made a trial where # keywords = 2

Different from what I expected, the fraction of contribution to each dimension is as follows:

0.3 0.7
0.4 0.6
..

For simplicity, I didn't note numbers in detail. There're some important observations:

  1. 2nd keyword is always more responsible than 1st keyword.
  2. Sum of the contribution is 1 (as I expected)
tarohi24 commented 4 years ago

How about adding activation function so that the loss function will truly choose the optimal 'topics' that coverts the whole document.

i.e. Remove words that have been covered (it means similarity of the word and a keyword is above 0.8)

tarohi24 commented 4 years ago

I guess this is because IDF is considered in loss but not considered in rerank.

tarohi24 commented 4 years ago

IDF isn't necessary for calculating IDF (because each keyword represent a 'concept'(