Question about HFT algorithm

ChenSuL commented 6 years ago

I find the performance of this algorithm is not good! According to the paper，the HFT model adds a regularization term on the basis of the BiasedMF model. So, does its performance should be better than the Biasedmf model?

In addition, the execution time of this algorithm is very long and cann't scale to large data sets. So, could you implement the L-BFGS training algorithm.

Thank you!

SunYatong commented 6 years ago

Hi @ChenSuL , thanks for your feedback.

I tested your code and found that "trainMatrix.get(u, j)!=0" is more efficient than "itemSet.contains(negItemIdx)".
And I found that caching each user's itemList is more efficient than their SparseVectors.

Here is my code:

        // cache each user's item list
        List<List<Integer>> userItemsList = new ArrayList<>();
        for (int userIdx = 0; userIdx < numUsers; ++userIdx) {
            userItemsList.add(new ArrayList(trainMatrix.getColumns(userIdx)));
        }

        for (int iter = 1; iter <= numIterations; iter++) {
            loss = 0.0d;
            for (int sampleCount = 0; sampleCount < numUsers * 100; sampleCount++) {
                // randomly draw (userIdx, posItemIdx, negItemIdx)
                int userIdx, posItemIdx, negItemIdx;

                while (true) {
                    userIdx = Randoms.uniform(numUsers);

                    List<Integer> itemList = userItemsList.get(userIdx);
                    if (itemList.size() == 0 || itemList.size() == numItems)
                        continue;

                    posItemIdx = itemList.get(Randoms.uniform(itemList.size()));
                    do {
                        negItemIdx = Randoms.uniform(numItems);
                    } while (trainMatrix.get(userIdx, negItemIdx)!=0);

                    break;
                }

SunYatong commented 6 years ago

I do have tested the performance of BiasedMF and HFT (with its authors' code), and the improvement of HFT is very limited, even worse in some cases.

The L-BFGS training might be implemented in the future, but we haven't made a specific plan about that.

BTW, does my BPR implementation presented above run faster than your version on your equipment?

ChenSuL commented 6 years ago

Thank you for your kind reply @SunYatong

There's no problem in your BPR implementation. So sorry, I put forward the question about BPR by mistake. Because I forgot that the version I used is 1.3! The problem has been solved very well in version 2.0.

SunYatong commented 6 years ago

Thank you all the same. It is still a valuable enhancement as I found that "trainMatrix.get(u, j)!=0" is more efficient than "itemSet.contains(negItemIdx)".

guoguibing / librec

Question about HFT algorithm #247