yuwenzho commented 3 months ago

Type of Change

feature API changed or not: no

Description

Use different WeightOnlyLinear module according to device.

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear
Load woq linear weight module by module
save hpu format tensor to reuse it once load it again: huggingface format save to local 'hpu_model.safetensor' file; default format save to 'quantized_hpu_weight.pt' file

load huggingface WOQ model example:

from neural_compressor.torch.quantization import load

model_id = "TheBloke/TinyLlama-1.1B-python-v0.1-GPTQ"
# first load: torch.nn.Linear -> INCWeightOnlyLinear -> HPUWeightOnlyLinear, 
# and then save hpu_model.safetensors to local cache dir
qmodel = load(model_name_or_path=model_id, format="huggingface", device="hpu")

# second load: torch.nn.Linear -> HPUWeightOnlyLinear using hpu_model.safetensors saved in local cache dir
qmodel = load(model_name_or_path=model_id, format="huggingface", device="hpu")

load INC WOQ model example:

from neural_compressor.torch.quantization import load

# first load: torch.nn.Linear -> INCWeightOnlyLinear -> HPUWeightOnlyLinear, 
# and then save quantized_hpu_weight.pt to 'saved_results' dir
qmodel = load("saved_results", original_model=fp32_model, device="hpu")

# second load: torch.nn.Linear -> HPUWeightOnlyLinear using quantized_hpu_weight.pt saved in 'saved_results' dir
qmodel = load("saved_results", original_model=fp32_model, device="hpu")

How has this PR been tested?

CI

Dependency Change?

No

github-actions[bot] commented 3 months ago

⛈️ Required checks status: Has failure 🔴

Warning If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow

| Check ID | Status | Error details | | | -------- | ------ | ---- | --- | | [Code-Scan](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31946) | success | | ✅ | | [Code-Scan (Bandit Code Scan Bandit)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31946&view=logs&jobId=343c57fa-283e-589b-e772-0a0553c93e53) | success | | ✅ | | [Code-Scan (DocStyle Code Scan DocStyle)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31946&view=logs&jobId=c1e234ec-db76-5d40-e8f0-e1ad3ef905a3) | success | | ✅ | | [Code-Scan (Pylint Code Scan Pylint)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31946&view=logs&jobId=454075da-6b11-57a5-edf2-4c5947924fa8) | success | | ✅ | These checks are required after the changes to `neural_compressor/torch/algorithms/weight_only/gptq.py`, `neural_compressor/torch/algorithms/weight_only/modules.py`, `neural_compressor/torch/algorithms/weight_only/rtn.py`, `neural_compressor/torch/algorithms/weight_only/save_load.py`, `neural_compressor/torch/quantization/load_entry.py`, `neural_compressor/torch/utils/environ.py`, `neural_compressor/torch/utils/utility.py`.

🟢 Model Tests 3x workflow

| Check ID | Status | Error details | | | -------- | ------ | ---- | --- | | [Model-Test-3x](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31948) | success | | ✅ | | [Model-Test-3x (Generate Report GenerateReport)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31948&view=logs&jobId=131b5a5d-c16f-50a4-e704-41ef17f1e502) | success | | ✅ | | [Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31948&view=logs&jobId=e3333e26-4334-5d5e-a7bd-ee7471860e42) | success | | ✅ | | [Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_bnb)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31948&view=logs&jobId=2ce7611e-1b4a-5a32-2344-8e2c635b00bd) | success | | ✅ | | [Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_ggml)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31948&view=logs&jobId=919e49eb-e265-548b-eaa7-50d75ec3f230) | success | | ✅ | These checks are required after the changes to `neural_compressor/torch/algorithms/weight_only/gptq.py`, `neural_compressor/torch/algorithms/weight_only/modules.py`, `neural_compressor/torch/algorithms/weight_only/rtn.py`, `neural_compressor/torch/algorithms/weight_only/save_load.py`, `neural_compressor/torch/quantization/load_entry.py`, `neural_compressor/torch/utils/environ.py`, `neural_compressor/torch/utils/utility.py`.

🔴 Unit Tests 3x-PyTorch workflow

| Check ID | Status | Error details | | | -------- | ------ | ---- | --- | | [UT-3x-Torch](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31947) | failure | | ❌ | | [UT-3x-Torch (Coverage Compare CollectDatafiles)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31947&view=logs&jobId=71384379-497b-5787-1f51-cc2e0f831d78) | failure | [download](https://artprodcus3.artifacts.visualstudio.com/Acd5c2212-3bfc-4706-9afe-b292ced6ae69/b7121868-d73a-4794-90c1-23135f974d09/_apis/artifact/cGlwZWxpbmVhcnRpZmFjdDovL2xwb3QtaW5jL3Byb2plY3RJZC9iNzEyMTg2OC1kNzNhLTQ3OTQtOTBjMS0yMzEzNWY5NzRkMDkvYnVpbGRJZC8zMTk0Ny9hcnRpZmFjdE5hbWUvVVRfY292ZXJhZ2VfcmVwb3J0XzN4X3B00/content?format=zip) | ❌ | | [UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31947&view=logs&jobId=ce119872-54c8-5686-93fc-b763560515d2) | success | | ✅ | | [UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)](https://dev.azure.com/lpot-inc/b7121868-d73a-4794-90c1-23135f974d09/_build/results?buildId=31947&view=logs&jobId=337dfcaa-e49a-58ed-d835-0606580c9539) | success | | ✅ | These checks are required after the changes to `neural_compressor/torch/algorithms/weight_only/gptq.py`, `neural_compressor/torch/algorithms/weight_only/modules.py`, `neural_compressor/torch/algorithms/weight_only/rtn.py`, `neural_compressor/torch/algorithms/weight_only/save_load.py`, `neural_compressor/torch/quantization/load_entry.py`, `neural_compressor/torch/utils/environ.py`, `neural_compressor/torch/utils/utility.py`, `test/3x/torch/quantization/weight_only/test_autoround.py`, `test/3x/torch/quantization/weight_only/test_awq.py`, `test/3x/torch/quantization/weight_only/test_gptq.py`, `test/3x/torch/quantization/weight_only/test_load.py`, `test/3x/torch/quantization/weight_only/test_load_woq_hf_model.py`, `test/3x/torch/quantization/weight_only/test_rtn.py`.

Thank you for your contribution! 💜

Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

Kaihui-intel commented 3 months ago

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear For cpu, how does the woq algorithm use abstract class WeightOnlyLinear? Do we use INCweightonlinear instead of WeightOnlyLinear?