stochasticai / xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
https://xturing.stochastic.ai
Apache License 2.0
2.61k stars 207 forks source link

Integrate ITREX to support popular compression algorithms and highly optimized kernels #263

Closed yiliu30 closed 1 year ago

yiliu30 commented 1 year ago

First PR for https://github.com/stochasticai/xTuring/issues/264

Usage

from xturing.models import BaseModel

# Sepcific the quantizatuion configuration
from intel_extension_for_transformers.transformers import WeightOnlyQuantConfig
woq_config = WeightOnlyQuantConfig(weight_dtype='int8')
model = BaseModel.create("gpt2", quantization_config=woq_config)

# Inference model with itrex's highly optimized kernels
output = model.generate(texts=["Why are the LLM models important?"])

TODO

@StochasticRomanAgeev @tushar2407

StochasticRomanAgeev commented 1 year ago

Hi @yiliu30, Thanks for pr! First question, what is better in this approach than our already supported int8 version of models?

yiliu30 commented 1 year ago

Created https://github.com/stochasticai/xTuring/pull/268 for this integration, close it first.