ZeroShotClassificationPipeline not using GPU

ierezell commented 2 years ago

System Info

- `transformers` version: 4.18.0
- Platform: Linux-5.13.0-37-generic-x86_64-with-glibc2.31
- Python version: 3.9.10
- Huggingface_hub version: 0.4.0
- PyTorch version (GPU?): 1.11.0+cu113 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Who can help?

Hello, @Narsil sorry to bother you once again....

When using a ZeroShotClassificationPipeline, it seems that a lot of preprocessing is done on CPU instead of GPU.

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

from transformers.modeling_utils import PreTrainedModel
from transformers.models.auto.modeling_auto import AutoModelForSequenceClassification
from transformers.models.auto.tokenization_auto import AutoTokenizer
from transformers.pipelines.zero_shot_classification import ZeroShotClassificationPipeline
from transformers.tokenization_utils import PreTrainedTokenizer
import torch
import itertools

few_show_classification_model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
few_show_classification_model = few_show_classification_model.to(torch.device("cuda"))
few_show_classification_tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")

classifier = ZeroShotClassificationPipeline(model=few_show_classification_model, tokenizer=few_show_classification_tokenizer, device=0, multi_label=False)

words = ["hello", "you", "I", "am", "beautiful", "and", "we", "like", "sugar"]
utterances = [" ".join(w) for w in list(itertools.permutations(words))[:4]]
contexts = [" ".join(w) for w in list(itertools.permutations(words))[:3000]]

classifier(utterances, contexts)

Model is taking 2.9Gb on GPU RAM (+small other things) The GPU ram is not changing at all during all inference The CPU ram is however using a lot (with a lot of contexts just for the sake of explanation)

The CPU ram in my case goes from 5Gb to 22.6Gb and the GPU ram stays the same.

Expected behavior

I was hoping of having a CUDA out of memory error instead of a huge load in CPU ram.

Maybe the model is computing on CPU because of a bad initialization?

Thanks a lot in advance. Have a great day.

Narsil commented 2 years ago

@Ierezell ,

preprocessing will always happen on CPU so it's not entirely surprising. There's no way to make preprocessing happen on GPU (tokenization) afaik.

Here you're using 3k sentences X 3k labels so we're looking at 9M individual input_ids sequences that have to be generated.

Can you try doing this:

from transformers.modeling_utils import PreTrainedModel
from transformers.models.auto.modeling_auto import AutoModelForSequenceClassification
from transformers.models.auto.tokenization_auto import AutoTokenizer
from transformers.pipelines.zero_shot_classification import ZeroShotClassificationPipeline
from transformers.tokenization_utils import PreTrainedTokenizer
import torch
import itertools

few_show_classification_model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
few_show_classification_model = few_show_classification_model.to(torch.device("cuda"))
few_show_classification_tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")

classifier = ZeroShotClassificationPipeline(model=few_show_classification_model, tokenizer=few_show_classification_tokenizer, device=0, multi_label=False)

words = ["hello", "you", "I", "am", "beautiful", "and", "we", "like", "sugar"]

def utterances(words):
    for w in itertools.permutations(words):
        yield " ".join(w)
contexts = [" ".join(w) for w in list(itertools.permutations(words))[:3000]]

classifier(utterances(), contexts)

Should be easier on your RAM, please note that list(itertools.permutations) is still creating 8! (40k) objects.

ierezell commented 2 years ago

Hello @Narsil,

Thanks for the fast reply :)

It was my guess but I'm happy to have the confirmation. I just didn't though that pre-processing could take that much memory (in the example it's too much for sure).

As it's utterances X labels the memory requirement can raise quite fast (in my case 10 labels vs 500). Using a generator is indeed saving a good part of memory. My fix was batching on the labels (contexts).

Thanks again for your time and help. Have a great day.

huggingface / transformers