intel / llm-on-ray

Pretrain, finetune and serve LLMs on Intel platforms with Ray
Apache License 2.0
103 stars 30 forks source link

[Inference] Add validated models for Gaudi #225

Closed Deegue closed 4 months ago

Deegue commented 6 months ago

Model list:

bloom-7b1 single card without template Falcon-7b single card without template Falcon-40b multiple cards without template Gemma-2b single card without template Llama3-7b single card unknow Llama3-70b multiple cards unknow Mistral-7b single card without template Mixtral-8x7B-Instruct-v0.1 single card with template llama-2-7b single card unknow llama-2-70b multiple cards unknow CodeLlama single card unknow GPT2 single card without template GPT-J single card without template MPT-7b single card without template Qwen1.5-110B single card with template

Deegue commented 5 months ago

All CI passed. Gentle ping @carsonwang for review, thanks!

carsonwang commented 5 months ago

@kira-lin is helping to review this. For qwen, can you please update to use Qwen/Qwen2-7B-Instruct?

Deegue commented 5 months ago

@kira-lin is helping to review this. For qwen, can you please update to use Qwen/Qwen2-7B-Instruct?

Added Qwen1.5-7B-Chat and Qwen2-7B-Instruct.

kira-lin commented 5 months ago

For falcon and qwen, @KepingYan can look into this.

Deegue commented 5 months ago

https://github.com/intel/llm-on-ray/blob/cdce225cb2285baf9c151fecc7b6af853412e030/llm_on_ray/inference/predictors/hpu_predictor.py#L340

Let's modify this line according to: https://github.com/huggingface/optimum-habana/blob/595cc3e4ec219b1ce469b323cf94e994c5c5d8f3/examples/text-generation/utils.py#L311-L312

Updated, thanks for comment. Btw, will it still be any place to be changed since I found some other places specially handled through model_type == llama?