Closed ierezell closed 2 years ago
@Ierezell ,
preprocessing will always happen on CPU so it's not entirely surprising. There's no way to make preprocessing happen on GPU (tokenization) afaik.
Here you're using 3k sentences X 3k labels so we're looking at 9M individual input_ids
sequences that have to be generated.
Can you try doing this:
from transformers.modeling_utils import PreTrainedModel
from transformers.models.auto.modeling_auto import AutoModelForSequenceClassification
from transformers.models.auto.tokenization_auto import AutoTokenizer
from transformers.pipelines.zero_shot_classification import ZeroShotClassificationPipeline
from transformers.tokenization_utils import PreTrainedTokenizer
import torch
import itertools
few_show_classification_model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
few_show_classification_model = few_show_classification_model.to(torch.device("cuda"))
few_show_classification_tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
classifier = ZeroShotClassificationPipeline(model=few_show_classification_model, tokenizer=few_show_classification_tokenizer, device=0, multi_label=False)
words = ["hello", "you", "I", "am", "beautiful", "and", "we", "like", "sugar"]
def utterances(words):
for w in itertools.permutations(words):
yield " ".join(w)
contexts = [" ".join(w) for w in list(itertools.permutations(words))[:3000]]
classifier(utterances(), contexts)
Should be easier on your RAM, please note that list(itertools.permutations)
is still creating 8!
(40k) objects.
Hello @Narsil,
Thanks for the fast reply :)
It was my guess but I'm happy to have the confirmation. I just didn't though that pre-processing could take that much memory (in the example it's too much for sure).
As it's utterances X labels the memory requirement can raise quite fast (in my case 10 labels vs 500). Using a generator is indeed saving a good part of memory. My fix was batching on the labels (contexts).
Thanks again for your time and help. Have a great day.
System Info
Who can help?
Hello, @Narsil sorry to bother you once again....
When using a ZeroShotClassificationPipeline, it seems that a lot of preprocessing is done on CPU instead of GPU.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Model is taking 2.9Gb on GPU RAM (+small other things) The GPU ram is not changing at all during all inference The CPU ram is however using a lot (with a lot of contexts just for the sake of explanation)
The CPU ram in my case goes from 5Gb to 22.6Gb and the GPU ram stays the same.
Expected behavior
I was hoping of having a CUDA out of memory error instead of a huge load in CPU ram.
Maybe the model is computing on CPU because of a bad initialization?
Thanks a lot in advance. Have a great day.