In different scenarios, we sometimes deal with bipartite formatted data, but sometimes the bipartite datasets are converted to the monopartite form, monopartite meaning that X is formed by pairwise concatenations of all possible X[0] and X[1] rows, and bipartite meaning X = [X[0], X[1]].
Even more so since some estimators do accept both types of input for predict() (tree-based models in general) while others only accept the bipartite format (the matrix factorization ones, for instance), but all of them should yield flattened predictions for better integration with scikit-learn scoring utilities, which I reckon can be quite confusing.
I suppose an estimator tag would be an appropriate way of signaling that.
Maybe a whole Dataset class would facilitate maintenance in the long term.
In different scenarios, we sometimes deal with bipartite formatted data, but sometimes the bipartite datasets are converted to the monopartite form, monopartite meaning that
X
is formed by pairwise concatenations of all possibleX[0]
andX[1]
rows, and bipartite meaningX = [X[0], X[1]]
.As mentioned in https://github.com/pedroilidio/bipartite_learn/issues/5#issuecomment-1509892869, the way of distinguishing between these two formats deserves more careful solutions than what we currently do:
https://github.com/pedroilidio/bipartite_learn/blob/84998676b7847c5564cdacfc1d43d269e4eb6140/bipartite_learn/utils/__init__.py#L40-L42
Even more so since some estimators do accept both types of input for
predict()
(tree-based models in general) while others only accept the bipartite format (the matrix factorization ones, for instance), but all of them should yield flattened predictions for better integration withscikit-learn
scoring utilities, which I reckon can be quite confusing.Dataset
class would facilitate maintenance in the long term.