huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.54k stars 27.13k forks source link

A new category for recsys #26256

Open Anindyadeep opened 1 year ago

Anindyadeep commented 1 year ago

Feature request

A new category in HuggingFace (both in datasets and models) for recommendation systems.

Motivation

HuggingFace has a rich ecosystem of diverse sets of datasets and models. We have model types ranging from

  1. Language Models
  2. Graph Models
  3. Vision models
  4. Multimodal etc

And same goes for datasets. However, the one significant category that I did not find and is missing is the recommendation system. Recommendation systems are very important for enterprises and it is one of the most interesting dynamic fields in Machine Learning.

We can support recsys in several ways. There are recsys for tabular data, tabular + NLP data, vision data, etc. And the same goes for models too. And while I just started learning and doing some research on recys, I am seeing that I do not have any SOTA models present in huggingface.

For example, I can not have any method going like this right now. Where, I do not care much about the candidate generator but simply take a SOTA candidate generator and focus on my ranker model.

from transformers import RecSysVocab
from transformers import CandidateGen

# this can path to csv or a matrix with user's properties 
user_vocab_lookup_table = RecSysVocab.from_pretrained('/path/to/user.csv')
item_vocab_lookup_table = RecSysVocab.from_pretrained('/path/to/item.csv')

# build the candidate generator model
candidate_gen = CandidateGen.from_pretrained('some-sota-candidate-gen')

# now fit the model
candidate_gen.find_top_k_similarity(
    user_id = "some user id",
    user_columns = [...], # a vector with that user's propeties
    user_vocab_lookup_table = user_vocab_lookup_table,
    item_vocab_lookup_table = item_vocab_lookup_table
)

The above is a very simple (less accurate) pseudo code, just to have a glimpse of the interface. However it would be awesome to have something specifically for recommendation systems.

Your contribution

I am not sure, if this issue is never thought of before or not. But it would be awesome on working on this, if the core maintainers and contributors are in the same page. RecSys is very diverse. Some of methods involves sequences well others involves these two stage approaches (candidate gen + filtering). So I feel like some discussion would be required on how to structure the modules and how to create to build those interfaces such that it matches with existing methods of huggingface.

In terms of my contribution, I can help with these and exited to contribute on this, if I hear back form the community showing similar grounds of interest. Would love to contribute.

Anindyadeep commented 1 year ago

@ArthurZucker, just came to check in here to know whether is this active or not or just to know any thoughts on this.

Thanks

LysandreJik commented 1 year ago

Hello @Anindyadeep, for now I do not think we will add recommendation system capabilities to transformers unless there is a very large number of requests.

However, we'd be more than happy in helping you or anyone else from the community integrate their recommendation system utility to our other tools and to the Hub; reading your code snippet, I understand this is the true value of what you offer.

Anindyadeep commented 1 year ago

Ah I see @LysandreJik, thanks for the update. At least for now, my use case is been served. However, it would be great sometime, if this feature would be integrated.