sony / model_optimization

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
https://sony.github.io/model_optimization/
Apache License 2.0
331 stars 53 forks source link

Add initial Sample-Layer Attention for GPTQ (PyTorch) #1237

Closed irenaby closed 1 month ago

irenaby commented 1 month ago

Pull Request Description:

Add hessian estimation per image hash. Add sample-layer attention distillation loss. Add weights per layer to soft round loss. Update GPTQ config and its generation for sample layer attention.

Checklist before requesting a review: