Open smadha opened 8 years ago
Actually we can try a much simple way. We expand training pairs with features of question and users.
invited_info_train.txt
Q_1 U_1 0
Q_2 U_2 1
..
..
Q_n U_n 0
Final training data -
Q_1F_1 Q_1F_2 Q_1F_3 U_1F_1 U_1F_2 U_1F_3 U_1F_4 0
Q_2F_1 Q_2F_2 Q_2F_3 U_2F_1 U_2F_2 U_2F_3 U_2F_4 1
..
..
Q_nF_1 Q_nF_2 Q_nF_3 U_nF_1 U_nF_2 U_nF_3 U_nF_4 0
We can now train any classifier like BDT, SVM and get a model.
Test data can be formed as
Q_iF_1 Q_iF_2 Q_iF_3 U_iF_1 U_iF_2 U_iF_3 U_iF_4
Once we create clusters we can remove bag of word features and replace them with cluster ids. We will create different clusters on basis of Word ID sequence and Character ID sequence in user_info.txt and question_info.txt
For every user, question (U_i,Q_i) pair in test data. Find all the users in training data who have already answered/ignored this question Q_i. let's call then U_a and U_ig From here we can make it a binomial classifier where users who ignored become class 0 and users who answered become class 1.
Similarly we can find all the questions in training data already answered/ignored by user U_i and build a classifier using similar method
Clusters can add as features in this model.