IPEX with huggingface zero-shot-classification pipeline?

pmusser commented 7 months ago

Describe the issue

Not sure if this is the right place for broader questions (if not I welcome being pointed in the right direction), but are there plans or work already ongoing toward using ipex with hf pipelines like zero-shot-classification? I'm admittedly new to working with llm's, and doubly so running them locally so this may not actually be an issue, but right now that's my big use case, and pipelines made it shockingly easy for me to get started.

When I first tried working with hf locally a month ago I didn't realize that I needed oneAPI to make my Arc A770 LE suitable for the task, so I ended up digging out an old GTX 1660 Super and sideloading it, which was a massive improvement over running the models on my CPU. Since then though, while trying to get ipex working I discovered that installing oneAPI let me set xpu as the device, which also squeezed out even more performance (about 25% faster completion on the Arc than on the GTX) without managing to integrate ipex into the code.

However, I wasn't sure if the 25% gain aligned with how much performance improvement I should expect, or if there are some optimizations built into ipex that might eke out even more (if I could just figure out how to integrate it). I've tried to cobble together some changes to my code based on the ipex documentation (and now that blog post that came out a few days back) to get it integrated, but I admit that I may just not know what I'm doing well enough to know if that's even a task worth undertaking.

For reference, this is the code I'm testing with (albeit with a subset of candidate_labels to keep it short):

import torch
import transformers
import time
from transformers import pipeline

model = "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli"
classifier = pipeline("zero-shot-classification",
                      model=model, 
                      framework="pt",
                      device="xpu")

combined = []

# start timing how long it takes to run this cell
start_time = time.time()

sequence_to_classify = "Accelerating Systemic Change in Higher Education Long Description: This volume of Transforming Institutions follows from and builds on its predecessor of five years ago (Weaver et al., 2015) with a mix of case studies, models, and analyses. The authors and editors provide key perspectives for advancing change initiatives in higher education and STEM education. The Transforming Institutions conferences and book series began with the first convening in 2011 at Purdue University, organized by the Discovery Learning Research Center (DLRC), and continues with the 2019 and 2021 Transforming Institutions Conferences. The meeting sought then, as it still does, to bring together researchers, academic leaders, national organizations and funding agency representatives to discuss the practical aspects of changing institutional practices to align with the large body of evidence in the field. The editors and authors of this volume consider this work to be a beginning and hope it will be a call to action for every reader. See more at: https://ascnhighered.org/ASCN/ti_19_book.htm ISBN: 9798565063523"
candidate_labels = ['Social Science -- Psychology', 'Arts and Humanities -- Performing Arts', 'Mathematics', 'Mathematics -- Ratios and Proportions', 'Physical Science -- Oceanography', 'Life Science -- Genetics', 'Arts and Humanities', 'Life Science', 'Arts and Humanities -- Art History', 'Physical Science -- Geology', 'Arts and Humanities -- Philosophy', 'Social Science -- Political Science', 'Social Science -- Gender and Sexuality Studies', 'Social Science -- Ethnic Studies', 'Social Science -- Cultural Geography', 'English Language Arts -- Reading Foundation Skills', 'English Language Arts -- Reading Informational Text', 'Applied Science ', 'Education -- Elementary Education', 'Business and Communication -- Marketing', 'Education -- Early Childhood Development', 'Social Science', 'Social Science -- Anthropology', 'English Language Arts -- Speaking and Listening', 'History', 'Applied Science -- Environmental Science', 'English Language Arts -- Language, Grammar and Vocabulary', 'Business and Communication -- Accounting', 'Career and Technical Education -- Agriculture', 'Mathematics -- Measurement and Data', 'Career and Technical Education -- Maritime Science']
results = classifier(sequence_to_classify, candidate_labels)

# end timing this cell's run time
end_time = time.time()

# combined = zip(results['labels'],results['scores'])
combined = [(label, format(score * 100, ".2f")) for label, score in zip(results['labels'], results['scores'])]
print(list(combined))
total_time = end_time - start_time
print(f"Total time taken: {total_time} seconds")

When I run this with the full list of candidate labels it takes about 5 seconds (heck yeah), and when I expand it out to a bulk operation on 1900 similar records it takes about an hour and 40 minutes (I love that).

TLDR: which question is the right one for me to ask:

Is this this the best I can expect?
Would figuring out how to add in ipex improve performance?
Is this even a valid use case of ipex, or have I just fallen down a rabbit hole that's not actually relevant?

vishnumadhu365 commented 6 months ago

@pmusser Hey Peter Firstly to your questions,

Is this this the best I can expect?

Since you are batch processing (~1900) records. We might have room to improve the throughput from that A770

Would figuring out how to add in ipex improve performance?

Adding IPEX should be rather easy, I have extended your code below

Is this even a valid use case of ipex, or have I just fallen down a rabbit hole that's not actually relevant?

Very much a valid usecase for ipex.

Summarizing the changes made to the original code snippet:

Enable IPEX optimizations for BF16
Zero-shot-classification pipeline supports batching input sequences(provided as a list) and also candidate labels within each sequence. Enabling batching helps improve the throughput

Give the below approach a try.

import torch
import intel_extension_for_pytorch as ipex
import transformers
import time
from transformers import pipeline

model = "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli"
classifier = pipeline("zero-shot-classification",
                      model=model, 
                      framework="pt",
                      device="xpu",
                      )

classifier.model = classifier.model.to(memory_format=torch.channels_last)
classifier.model = ipex.optimize(classifier.model, inplace=True, dtype=torch.bfloat16)

sequence_batch = 512
num_records = 1900

combined = []

# start timing how long it takes to run this cell
start_time = time.time()

#To-Do : Modify below list with a list of actual sequences. For ease of testing the same sequence was repeasted 1900 times
sequence_to_classify = ["Accelerating Systemic Change in Higher Education Long Description: This volume of Transforming Institutions follows from and builds on its predecessor of five years ago (Weaver et al., 2015) with a mix of case studies, models, and analyses. The authors and editors provide key perspectives for advancing change initiatives in higher education and STEM education. The Transforming Institutions conferences and book series began with the first convening in 2011 at Purdue University, organized by the Discovery Learning Research Center (DLRC), and continues with the 2019 and 2021 Transforming Institutions Conferences. The meeting sought then, as it still does, to bring together researchers, academic leaders, national organizations and funding agency representatives to discuss the practical aspects of changing institutional practices to align with the large body of evidence in the field. The editors and authors of this volume consider this work to be a beginning and hope it will be a call to action for every reader. See more at: https://ascnhighered.org/ASCN/ti_19_book.htm ISBN: 9798565063523"] * num_records

candidate_labels = ['Social Science -- Psychology', 'Arts and Humanities -- Performing Arts', 'Mathematics', 'Mathematics -- Ratios and Proportions', 'Physical Science -- Oceanography', 'Life Science -- Genetics', 'Arts and Humanities', 'Life Science', 'Arts and Humanities -- Art History', 'Physical Science -- Geology', 'Arts and Humanities -- Philosophy', 'Social Science -- Political Science', 'Social Science -- Gender and Sexuality Studies', 'Social Science -- Ethnic Studies', 'Social Science -- Cultural Geography', 'English Language Arts -- Reading Foundation Skills', 'English Language Arts -- Reading Informational Text', 'Applied Science ', 'Education -- Elementary Education', 'Business and Communication -- Marketing', 'Education -- Early Childhood Development', 'Social Science', 'Social Science -- Anthropology', 'English Language Arts -- Speaking and Listening', 'History', 'Applied Science -- Environmental Science', 'English Language Arts -- Language, Grammar and Vocabulary', 'Business and Communication -- Accounting', 'Career and Technical Education -- Agriculture', 'Mathematics -- Measurement and Data', 'Career and Technical Education -- Maritime Science']

results = classifier(sequence_to_classify, candidate_labels, batch_size=sequence_batch) 

# end timing this cell's run time
end_time = time.time()

total_time = end_time - start_time
print(f"Sequence batch : {sequence_batch}")
print(f"Total time taken: {total_time} seconds")
print(f"Time per sequence: {total_time/(num_records)} seconds")

# To-Do : Since we are computing batches, modify below line to iterate over the result to get the scores
combined = [(label, format(score * 100, ".2f")) for label, score in zip(results[0]['labels'], results[0]['scores'])]
print(list(combined))

pmusser commented 5 months ago

@vishnumadhu365 thanks for your help! I ran the code as above with a smaller sample set and it worked a dream, but the following error popped up twice: [C:\Users\Peter\.conda\envs\llm\lib\site-packages\intel_extension_for_pytorch\frontend.py:465](file:///C:/Users/Peter/.conda/envs/llm/lib/site-packages/intel_extension_for_pytorch/frontend.py#line=464): UserWarning: Conv BatchNorm folding failed during the optimize process. warnings.warn(

vishnumadhu365 commented 5 months ago

@pmusser great! that it worked for you. Btw did you observe a significant improvement in throughput ? I wouldn't be surprised if the A770 churned 1900 samples in less than 10 mins ;).

Regarding UserWarning: Conv BatchNorm folding failed during the optimize process. warnings.warn( , its a warning that's safe to ignore.

pmusser commented 5 months ago

@pmusser great! that it worked for you. Btw did you observe a significant improvement in throughput ? I wouldn't be surprised if the A770 churned 1900 samples in less than 10 mins ;).

Regarding UserWarning: Conv BatchNorm folding failed during the optimize process. warnings.warn( , its a warning that's safe to ignore.

It worked on a few smaller samples, but when I tried running it on a larger sample set I think the GPU's memory overflowed and led to a hard crash. I haven't had a chance to run it again and check windows error logs, but I will do so ASAP and get back to ya! I had similar issues before getting IPEX running which I reported here.

devpramod commented 1 month ago

Hi @pmusser Are you looking for further assistance in this case?

devpramod commented 1 month ago

Closed due to inactivity

intel / intel-extension-for-pytorch

IPEX with huggingface zero-shot-classification pipeline? #547

Describe the issue