Scaling text classification / reusing models

timsuchanek commented 4 years ago

If I have a system, where I want to train many text classifiers for many users, how could I go about it with the transformers library in a scalable way? Right now I would have to run let's say a 10min training per user on a RTX 2080 ti for Albert for the dataset I have. That doesn't scale if I have thousands of users.

If I understand correctly, in the sequence classification models, the whole transformer model is being trained, so the backpropagation happens through the whole network.

However, if I now want to reuse the model for another user, maybe just passing in a bit more labeled data to customize a base classifier, how could I go about that?

It seems to me, that I would have to basically "freeze" the whole "Bert" model, no matter which one I would use, and then only train a thin layer on top.

One possibility I see would be KNN using sentence transformers, I already asked in the repo there https://github.com/UKPLab/sentence-transformers/issues/209

Maybe someone here has an idea which approach would make sense for such a situation.

Thanks!

julien-c commented 4 years ago

You can pretty easily "freeze" parameters you don't want to backpropagate against, in PyTorch:

for param in parameters:
    param.requires_grad = False

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

huggingface / transformers

Scaling text classification / reusing models #4418