kiudee / cs-ranking

Context-sensitive ranking and choice in Python with PyTorch
https://cs-ranking.readthedocs.io
Apache License 2.0
66 stars 15 forks source link

FETA subsampling not working #160

Open timokau opened 4 years ago

timokau commented 4 years ago

While working on #116, I noticed that the sub_sampling function of feta_network is broken. Its not exercised in our standard test-suite, since its only needed when the number of objects is higher than the 5 objects our testsuite uses.

The function is implemented as follows:

def sub_sampling(self, X, Y):
    if self.n_objects_fit_ > self.max_number_of_objects:
        bucket_size = int(self.n_objects_fit_ / self.max_number_of_objects)
        idx = self.random_state_.randint(
            bucket_size, size=(len(X), self.n_objects_fit_)
        )
        # TODO: subsampling multiple rankings
        idx += np.arange(start=0, stop=self.n_objects_fit_, step=bucket_size)[
            : self.n_objects_fit_
        ]
        X = X[np.arange(len(X))[:, None], idx]
        Y = Y[np.arange(len(X))[:, None], idx]
        tmp_sort = Y.argsort(axis=-1)
        Y = np.empty_like(Y)
        Y[np.arange(len(X))[:, None], tmp_sort] = np.arange(self.n_objects_fit_)
    return X, Y

and breaks at the idx += line because of a dimension mismatch. It's trying to concatenate arrays like

[[0 1 0 0 0]
 [0 0 1 1 0]]

and

[0 2 4]

i.e. a 2d array with a 1d array. I'm not sure how this sampling is supposed to work. Is the intention documented somewhere @kiudee @prithagupta?

prithagupta commented 4 years ago

@timokau The example you gave is for the choice function, for which the function is overridden in feta_choice. For discrete choice, this function will produce an error and we need to implement it for discrete choice as well.

timokau commented 4 years ago

So this implementation should always be overridden? Could we just remove it then?

prithagupta commented 4 years ago

I think we are using it for ranking, or we can move it in the FetaObjectRanking class and we should think about the subsampling method for discrete choice.