Add vlm examples, bugfix

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

https://intel.github.io/neural-compressor/

Apache License 2.0

2.18k stars 252 forks source link

Open WeiweiZhang1 opened 5 days ago

WeiweiZhang1 commented 5 days ago

feature or bug fix or documentation or validation or others
API changed or not

detail description

the expected behavior that triggered by this PR

how to reproduce the test (including hardware information)

any library dependency introduced or removed