huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
364 stars 101 forks source link

add IPEX-XPU support for Llama2 model Inference #703

Open faaany opened 2 months ago

faaany commented 2 months ago

What does this PR do?

This PR enables Intel GPU support for Llama2 model inference in optimum-intel. Below is a code example:

import torch 
from transformers import AutoTokenizer, pipeline
from optimum.intel import IPEXModelForCausalLM

model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, export=True)
pipe = pipeline("text-generation", model=model, device="xpu", tokenizer=tokenizer, do_sample=False, num_beams=1, use_cache=True)
results = pipe("He's a dreadful magician and")
print(results)
#####[{'generated_text': "He's a dreadful magician and he's always getting things wrong. But he's got a heart of gold and he's always trying his best.\n\nThe other magicians in the circus are not very nice to him. They make fun of him and call him names. But Mr. Higglebottom doesn't let it get him down. He just keeps on trying and practicing his magic tricks.\n\nOne day, the circus is in town and Mr. Higglebottom is given the chance to perform in front of a big audience. He's nervous but he's determined to do his best. And to everyone's surprise, he actually manages to pull off a few good tricks! The audience cheers and claps for him and he feels proud of himself.\n\nFrom that day on, Mr. Higglebottom is no longer the laughing stock of the circus. He's respected and admired by all the other performers and he's finally found his place in the circus. He's learned that it's okay to make mistakes and that with hard work and determination, anything is possible."}]
faaany commented 2 months ago

Hi @echarlaix , this PR is a joint effort of @jiqing-feng, @ganyi1996ppo, and me. Could you pls help review this PR? Thanks a lot!

faaany commented 2 months ago

@yao-matrix

HuggingFaceDocBuilderDev commented 2 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.